I have a dictionary that's in the following format:
mydict = {'item1': ['label1_item', 'label2_item', 'label3_item', 'label4_item'], ...
'item999': ['label1_item999', 'label2_item999', 'label3_item999', 'label4_item999']}
this is how i'm currently outputing the dictionary:
filename = datetime.now().strftime('output_-%Y-%m-%d-%H-%M.csv')
df = pd.DataFrame(mydict)
df.to_csv(filename,encoding='utf-8', header = ['label1', 'label2', 'label3', 'label4'], sep=',')
I want to label the first column "item", but I am unable to label the first column, I have labels for columns 2 (label1)- columns 5 (label4). How do I modify my script to do so?
Not clear what you want, so I am assuming you want rows to be labeled ['label1', 'label2', 'label3', 'label4'] and columns to be labeled ['item', 'item99']
Reset index:
df.index = ['label1', 'label2', 'label3', 'label4']
Save:
df.to_csv(filename,encoding='utf-8', sep=',', header=['item', 'item99'])
Edit:
Based on your comment:
your dataframe needs to be transposed:
df = pd.DataFrame(mydict).T
which yields:
0 1 2 3
item1 label1_item label2_item label3_item label4_item
item999 label1_item999 label2_item999 label3_item999 label4_item999
then save:
df.to_csv(filename,encoding='utf-8', header=['label1', 'label2', 'label3', 'label4'], sep=',')
Related
I have a dataframe containing 4 columns. I want to use 2 of the columns as keys for a dictionary of dictionaries, where the values inside are the remaining 2 columns (so a dataframe)
birdies = pd.DataFrame({'Habitat' : ['Captive', 'Wild', 'Captive', 'Wild'],
'Animal': ['Falcon', 'Falcon','Parrot', 'Parrot'],
'Max Speed': [380., 370., 24., 26.],
'Color': ["white", "grey", "green", "blue"]})
#this should ouput speed and color
birdies_dict["Falcon"]["Wild"]
#this should contain a dictionary, which the keys are 'Captive','Wild'
birdies_dict["Falcon"]
I have found a way to generate a dictionary of dataframes with a single column as a key, but not with 2 columns as keys:
birdies_dict = {k:table for k,table in birdies.groupby("Animal")}
I suggest to use defaultdict for this, a solution for the 2 column problem is:
from collections import defaultdict
d = defaultdict(dict)
for (hab, ani), _df in df.groupby(['Habitat', 'Animal']):
d[hab][ani] = _df
This breaks with 2 columns, if you want it with a higher depth, you can just define a recursive defaultdict:
from collections import defaultdict
recursive_dict = lambda: defaultdict(recursive_dict)
dct = recursive_dict()
dct[1][2][3] = ...
Pass to_dict to the inside:
birdies_dict = {k:d.to_dict() for k,d in birdies.groupby('Animal')}
birdies_dict['Falcon']['Habitat']
Output:
{0: 'Captive', 1: 'Wild'}
Or do you mean:
out = birdies.set_index(['Animal','Habitat'])
out.loc[('Falcon','Captive')]
which gives:
Max Speed 380
Color white
Name: (Falcon, Captive), dtype: object
IIUC:
birdies_dict = {k:{habitat: table[['Max Speed', 'Color']].to_numpy() for habitat in table['Habitat'].to_numpy()} for k,table in birdies.groupby("Animal")}
OR
birdies_dict = {k:{habitat: table[['Max Speed', 'Color']] for habitat in table['Habitat'].to_numpy()} for k,table in birdies.groupby("Animal")}
#In this case inner key will have a dataframe
OR
birdies_dict = {k:{inner_key: inner_table for inner_key, inner_table in table.groupby('Habitat')} for k,table in birdies.groupby("Animal")}
From a csv file (initial.csv):
"Id","Name"
1,"CLO"
2,"FEV"
2,"GEN"
3,"HYP"
4,"DIA"
1,"COL"
1,"EOS"
4,"GAS"
1,"AEK"
I am grouping by the Id column and agreggating the Name column values so that each unique Id has all the Name values appended on the same row (new.csv):
"Id","Name"
1,"CLO","COL","EOS","AEK"
2,"FEV","GEN"
3,"HYP"
4,"DIA","GAS"
Now some rows have extra name values for which I want to append corresponding columns according the maximum count of Name values that exist on the rows, i.e.
"Id","Name","Name2","Name3","Name4"
1,"CLO","COL","EOS","AEK"
2,"FEV","GEN"
3,"HYP"
4,"DIA","GAS"
I do not understand how I can add new columns on dataframe to match the data.
Below is my code:
import pandas as pd
df = pd.read_csv('initial.csv', delimiter=',')
max_names_count = 0
for id in unique_ids_list:
mask = df['ID'] == id
names_count = len(df[mask])
if names_count > max_names_count:
max_names_count = names_count
group_by_id = df.groupby(["Id"]).agg({"Name": ','.join})
# Create new columns 'Id', 'Name', 'Name2', 'Name3', 'Name4'
new_column_names = ["Id", "Name"] + ['Name' + str(i) for i in range(2, max_names_count+1)]
group_by_id.columns = new_column_names # <-- ValueError: Length mismatch: Expected axis has 1 elements, new values have 5 elements
group_by_id.to_csv('new.csv', encoding='utf-8')
Try:
df = pd.read_csv("initial.csv")
df_out = (
df.groupby("Id")["Name"]
.agg(list)
.to_frame()["Name"]
.apply(pd.Series)
.rename(columns=lambda x: "Name" if x == 0 else "Name{}".format(x + 1))
.reset_index()
)
df_out.to_csv("out.csv", index=False)
Creates out.csv:
Id,Name,Name2,Name3,Name4
1,CLO,COL,EOS,AEK
2,FEV,GEN,,
3,HYP,,,
4,DIA,GAS,,
I have two lists
ids = [1,2,3]
values = [10,20,30]
I need to create a tsv file with two columns - id and result and put the ids and values in there. The output should look like this
id result
1 10
2 20
3 30
I wrote below code
output_columns = ['id','result']
data = zip(output_columns, ids, values)
with open('output.tsv', 'w', newline='') as f_output:
tsv_output = csv.writer(f_output, delimiter='\t')
tsv_output.writerow(data)
But this gives me an output like below which is wrong
('id', '1', '10') ('result', '2','20')
I understand that this wrong output is because the way I did zip to create a row of data. But I am not sure how to solve it.
Please suggest.
output_columns = ['id','result']
data = zip(ids, values)
with open('output.tsv', 'w', newline='') as f_output:
tsv_output = csv.writer(f_output, delimiter='\t')
tsv_output.writerow(output_columns)
for id, val in data:
tsv_output.writerow([id, val])
It's easier using pandas
In [8]: df = pd.DataFrame({"ids":[1,2,3], "values":[10,20,30]})
In [9]: df
Out[9]:
ids values
0 1 10
1 2 20
2 3 30
In [10]: df.to_csv("data.tsv", sep="\t", index=False)
I am trying to wrap text in python dataframe columns but this code is working for values in columns and not header of column.
I am using below code (taken form stackoverflow). Kindly suggest how to wrap header of dataframe
long_text = 'aa aa ss df fff ggh ttr tre ww rr tt ww errr t ttyyy eewww rr55t e'
data = {'a':[long_text, long_text, 'a'],'c': [long_text,long_text,long_text],
'b':[1,2,3]}
df = pd.DataFrame(data)
#choose columns of df for wrapping
cols_for_wrap = ['a','c']
writer = pd.ExcelWriter('aaa.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', index=False)
#modifyng output by style - wrap
workbook = writer.book
worksheet = writer.sheets['Sheet1']
wrap_format = workbook.add_format({'text_wrap': True})
#get positions of columns
for col in df.columns.get_indexer(cols_for_wrap):
#map by dict to format like "A:A"
excel_header = d[col] + ':' + d[col]
#None means not set with
worksheet.set_column(excel_header, None, wrap_format)
#for with = 20
worksheet.set_column(excel_header, 10, wrap_format)
writer.save()
In the header_format piece that jmcnamara linked, you can add or remove any formats you want or do not want.
header_format = workbook.add_format({
'bold': True,
'text_wrap': True,
'valign': 'top',
'fg_color': '#D7E4BC',
'border': 1})
# Write the column headers with the defined format.
for col_num, value in enumerate(df.columns.values):
worksheet.write(0, col_num + 1, value, header_format)
My code looks like this:
h_format = workbook.add_format({'text_wrap': True})
...
...
...
for col_num, value in enumerate(df_new.columns.values):
worksheet.write(0, col_num, value, format)
writer.save()
This is covered almost exactly in the Formatting of the Dataframe headers section of the XlsxWriter docs.
I have a list of countries such as:
country = ["Brazil", "Chile", "Colombia", "Mexico", "Panama", "Peru", "Venezuela"]
I created data frames using the names from the country list:
for c in country:
c = pd.read_excel(str(c + ".xls"), skiprows = 1)
c = pd.to_datetime(c.Date, infer_datetime_format=True)
c = c[["Date", "spreads"]]
Now I want to be able to merge all the countries data frames using the columns date as the key. The idea is to create a loop like the following:
df = Brazil #this is the first dataframe, which also corresponds to the first element of the list country.
for i in range(len(country)-1):
df = df.merge(country[i+1], on = "Date", how = "inner")
df.set_index("Date", inplace=True)
I got the error ValueError: can not merge DataFrame with instance of type <class 'str'>. It seems python is not calling the data frame which the name is in the country list. How can I call those data frames starting from the country list?
Thanks masters!
Your loop doesn't modify the contents of the country list, so country is still a list of strings.
Consider building a new list of dataframes and looping over that:
country_dfs = []
for c in country:
df = pd.read_excel(c + ".xls", skiprows=1)
df = pd.to_datetime(df.Date, infer_datetime_format=True)
df = df[["Date", "spreads"]]
# add new dataframe to our list of dataframes
country_dfs.append(df)
then to merge,
merged_df = country_dfs[0]
for df in country_dfs[1:]:
merged_df = merged_df.merge(df, on='Date', how='inner')
merged_df.set_index('Date', inplace=True)