Access third value of first key in dictionary python - python-3.x

I have created a dictionary where one key has multiple values - start_time_C, duration_pre_val, value_T. All are input from an excel sheet.
Then I have sorted the dictionary.
pre_dict = {}
pre_dict.setdefault(rows,[]).append(start_time_C)
pre_dict.setdefault(rows,[]).append(duration_pre_val)
pre_dict.setdefault(rows,[]).append(value_T)
pre_dict_sorted = sorted(pre_dict.items(), key = operator.itemgetter(1))
Now, I want to compare a value (Column T of the excel sheet) with value_T.
How do I access value_T from the dictionary?
Many thanks!

Let's break this into two parts:
Reading in the spreadsheet
I/O stuff like this is best handled with pandas; if you'll be working with spreadsheets and other tabular data in Python, get acquainted with this package. You can do something like
import pandas as pd
#read the excel file into a pandas dataframe
my_data = pd.read_excel('/your/path/filename.xlsx', sheetname='Sheet1')
Accessing elements of the data, creating a dict
Your spreadsheet's content is now in the pandas DataFrame "my_data". From here you can reference DataFrame elements like this
#pandas: entire column
my_data['value_T']
#pandas: 2nd row, 0th column
my_data.iloc[2, 0]
and create Python data structures
#create a dict from the dataframe
my_dict = my_data.set_index(my_data.index).to_dict()
#access the values associated with the 'value_T key of the dict
my_dict['value_T']

Related

Python data source - first two columns disappear

I have started using PowerBI and am using Python as a data source with the code below. The source data can be downloaded from here (it's about 700 megabytes). The data is originally from here (contained in IOT_2019_pxp.zip).
import pandas as pd
import numpy as np
import os
path = /path/to/file
to_chunk = pd.read_csv(os.path.join(path,'A.txt'), delimiter = '\t', header = [0,1], index_col = [0,1],
iterator=True, chunksize=1000)
def chunker(to_chunk):
to_concat = []
for chunk in to_chunk:
try:
to_concat.append(chunk['BG'].loc['BG'])
except:
pass
return to_concat
A = pd.concat(chunker(to_chunk))
I = np.identity(A.shape[0])
L = pd.DataFrame(np.linalg.inv(I-A), index=A.index, columns=A.columns)
The code simply:
Loads the file A.txt, which is a symmetrical matrix. This matrix has every sector in every region for both rows and columns. In pandas, these form a MultiIndex.
Filters just the region that I need which is BG. Since it's a symmetrical matrix, both row and column are filtered.
The inverse of the matrix is calculated giving us L, which I want to load into PowerBI. This matrix now just has a single regular Index for sector.
This is all well and good however when I load into PowerBI, the first column (sector names for each row i.e. the DataFrame Index) disappears. When the query gets processed, it is as if it were never there. This is true for both dataframes A and L, so it's not an issue of data processing. The column of row names (the DataFrame index) is still there in Python, PowerBI just drops it for some reason.
I need this column so that I can link these tables to other tables in my data model. Any ideas on how to keep it from disappearing at load time?
For what it's worth, calling reset_index() removed the index from the dataframes and they got loaded like regular columns. For whatever reason, PBI does not properly load pandas indices.
For a regular 1D index, I had to do S.reset_index().
For a MultiIndex, I had to do L.reset_index(inplace=True).

Filling new column with information from a list in multiple excel files using Python

I need to add a column to 40 excel files. The new column in each file will be filled with a name.
This is what I have:
files=[16686_Survey.xlsx, 16687_Survey.xlsx, 16772_Survey.xlsx, ...] (40 files with more than 200 rows each)
filenames=['name1', 'name2', 'name3', ...] (40 names)
I need to add a column to each excel file and write its corresponding name along the new column.
With the following code I got what I need for one file.
import pandas as pd
df = pd.read_excel('16686_Survey.xlsx')
df.insert(0, "WellName", "Name1")
writer = pd.ExcelWriter('16686_Survey.xlsx')
df.to_excel(writer, index = False)
writer.save()
But it would be inefficient if I do it 40 times, and I would like to learn how to use a loop to address this type of problem because I have been in the same situation many times.
The image is what I got with the code above. The first table in the image is what I have. The second table is what I want
Thank you for your help!
I'm not a 100% sure I understand your question but I think you're looking for this:
import pandas as pd
files=['16686_Survey.xlsx', '16687_Survey.xlsx', '16772_Survey.xlsx', ...]
filenames=['name1', 'name2', 'name3', ...]
for excel_file, other_name in zip(files, filenames):
df = pd.read_excel(excel_file)
df.insert(0, "WellName", other_name)
writer = pd.ExcelWriter(excel_file)
df.to_excel(writer, index = False)
writer.save()
I combined both the lists (I assumed they were the same length) using the zip function. The zip function takes items from the lists one by one and combines them so that all the first items are together and all the second and so forth.

Issue when exporting dataframe to csv

I'm working on a mechanical engineering project. For the following code, the user enters the number of cylinders that their compressor has. A dataframe is then created with the correct number of columns and is exported to Excel as a CSV file.
The outputted dataframe looks exactly like I want it to as shown in the first link, but when opened in Excel it looks like the image in the second link:
1.my dataframe
2.Excel Table
Why is my dataframe not exporting properly to Excel and what can I do to get the same dataframe in Excel?
import pandas as pd
CylinderNo=int(input('Enter CylinderNo: '))
new_number=CylinderNo*3
list1=[]
for i in range(1,CylinderNo+1):
for j in range(0,3):
Cylinder_name=str('CylinderNo ')+str(i)
list1.append(Cylinder_name)
df = pd.DataFrame(list1,columns =['Kurbel/Zylinder'])
list2=['Triebwerk', 'Packung','Ventile']*CylinderNo
Bauteil = {'Bauteil': list2}
df2 = pd.DataFrame (Bauteil, columns = ['Bauteil'])
new=pd.concat([df, df2], axis=1)
list3=['Nan','Nan','Nan']*CylinderNo
Bewertung={'Bewertung': list3}
df3 = pd.DataFrame (Bewertung, columns = ['Bewertung'])
new2=pd.concat([new, df3], axis=1)
Empfehlung={'Empfehlung': list3}
df4 = pd.DataFrame (Empfehlung, columns = ['Empfehlung'])
new3=pd.concat([new2, df4], axis=1)
new3.set_index('Kurbel/Zylinder')
new3 = new3.set_index('Kurbel/Zylinder', append=True).swaplevel(0,1)
#export dataframe to csv
new3.to_csv('new3.csv')
To be clear, a comma-separated values (CSV) file is not an Excel format type or table. It is a delimited text file that Excel like other applications can open.
What you are comparing is simply presentation. Both data frames are exactly the same. For multindex data frames, Pandas print output does not repeat index values for readability on the console or IDE like Jupyter. But such values are not removed from underlying data frame only its presentation. If you re-order indexes, you will see this presentation changes. The full complete data frame is what is exported to CSV. And ideally for data integrity, you want the full data set exported with to_csv to be import-able back into Pandas with read_csv (which can set indexes) or other languages and applications.
Essentially, CSV is an industry format to store and transfer data. Consider using Excel spreadsheets, HTML markdown, or other reporting formats for your presentation needs. Therefore, to_csv may not be the best method. You can try to build text file manually with Python i/o write methods, with open('new.csv', 'w') as f, but will be an extensive workaround See also #Jeff's answer here but do note the latter part of solution does remove data.

Python3 - Return CSV with row-level errors for missing data

New to Python. I'm importing a CSV, then if any data is missing I need to return a CSV with an additional column to indicate which rows are missing data. Colleague suggested that I import CSV into a dataframe, then create a new dataframe with a "Comments" column, fill it with a comment on the intended rows, and append it to the original dataframe. I'm stuck at the step of filling my new dataframe, "dferr", with the correct number of rows that would match up to "dfinput".
Have Googled, "pandas csv return error column where data is missing", but haven't found anything related to creating a new CSV that marks bad rows. I don't even know if the proposed way is the best way to go about this.
import pandas as pd
dfinput = None
try:
dfinput = pd.read_csv(r"C:\file.csv")
except:
print("Uh oh!")
if dfinput is None:
print("Ack!")
quit(10)
dfinput.reset_index(level=None, drop=False, inplace=True, col_level=0,
col_fill='')
dferr = pd.DataFrame(columns=['comment'])
print("Empty DataFrame", dferr, sep='\n')
Expected results: "dferr" would have an index column with number of rows equal to "dfinput", and comments on the correct rows where "dfinput" has missing values.
Actual results: "dferr" is empty.
My understanding of 'missing data' here would be null values. It seems that for every row, you want the names of null fields.
df = pd.DataFrame([[1,2,3],
[4,None,6],
[None,8,None]],
columns=['foo','bar','baz'])
# Create a dataframe of True/False, True where a criterion is met
# (in this case, a null value)
nulls = df.isnull()
# Iterate through every row of *nulls*,
# and extract the column names where the value is True by boolean indexing
colnames = nulls.columns
null_labels = nulls.apply(lambda s:colnames[s], axis=1)
# Now you have a pd.Series where every entry is an array
# (technically, a pd.Index object)
# Pandas arrays have a vectorized .str.join method:
df['nullcols'] = null_labels.str.join(', ')
The .apply() method in pandas can sometimes be a bottleneck in your code; there are ways to avoid using this, but here it seemed to be the simplest solution I could think of.
EDIT: Here's an alternate one-liner (instead of using .apply) that might cut down computation time slightly:
import numpy as np
df['nullcols'] = [colnames[x] for x in nulls.values]
This might be even faster (a bit more work is required):
np.where(df.isnull(),df.columns,'')

Create string from all row elements pandas

I have a csv file and I would like to create a string with all the elements of each row. Lets say that I have the following csv...
trump,clinton
google,microsoft,linkedin
linux,windows,osx
data science,operating systems
I would like to create a string like so; trump&clinton | google&microsoft&linkedin and so forth. I did import the file and create a df with pandas. The solution doesn't have to be with pandas, if can be done with import csv that is acceptable as well.
I need one string per row... each row will become its own string.
Try
df.apply('&'.join, axis=1)

Resources