I wanted to write a code which gets several .txt and .ASC data. All of those have to be run through some functions. So I thought it would be great to have a script which is doing it automatically.
The .txt contains more data (product, number, color, size) than the .ASC (product, number, size). So I have to adjust the head of each.
So, this is the first part of what I thought my script could look like.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import os
import new_methods as nem
import sys
sys.path.append("../../src/")
path_data ="C:///Users///"
fids = [file for file in os.listdir(path_data)]
d = dict()
for i in fids:
if i[-1]== 't':
d.update({i : nem.df(path_data+i, header_lines=1)})
elif i[-1] == 'C':
d.update({i : nem.df(path_data+i, header_lines=0)})
for val in d.values():
txt_fid=d[val]
dh_txt=nem.sort(txt_fid)
But it gives a Typeerror
TypeError: 'DataFrame' objects are mutable, thus they cannot be hashed
It does work if I change the last part to
txt_fid=d['specific.txt']
dh_txt=nem.sort(txt_fid)
But like this I have to change manually for every txt sheet.
Like the error says you cannot have a key in a dictionary that is mutable (such as a DataFrame), which you are doing when you do d[val] because val is a DataFrame.
Did you mean to use the value of the dictionary or did you want the keys? Or some element of the DataFrame perhaps?
If you want the keys and not the values, you can simply do for val in d: instead.
Related
data set imagePlease use python language. I'm a beginner in frequent data mining systems. I'm trying to understand. Be simple and detailed as much as possible please
I tried using the for loop to collect data from a range but I'm still learning so I don't know how to implement it (keeps giving me the error "index 1 is out of bounds for axis 1 with size 1"). Please guide me.
NB: I was trying to construct a data frame but I don't know how to. Help me with that too
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import csv
# Calling DataFrame constructor
Data = pd.read_csv('retail.txt', header = None)
# Intializing the list
transacts = []
# populating a list of transactions
for i in range(1, 9):
transacts.append([str(Data.values[i,j]) for j in range(1, 2000)])
df = pd.DataFrame()
I have over 50 csv files to process, but each dataset like this:
[a simplified csv file example] (https://i.stack.imgur.com/yoGo9.png)
There are three columns: Times, Index, Voltage
I want to interpolate time when total voltage decreases [here is (84-69) = 15] reaches 53% [ it means 15*0.53] at index 2.
I will repeat this process for index 4, too. May I ask what I should do?
I am a beginner for python and try this following script:
source code (
import pandas as pd
import glob
import numpy as np
import os
import matplotlib.pyplot as plt
import xlwings as xw
path = os.getcwd()
csv_files = glob.glob(os.path.join(path, "*.csv")) # read all csv files
for f in csv_files: # process each dataset
df = pd.read_csv(f)
ta95 = df[df.Step_Index == 2] # create new dataframe based on index
a=ta95.iloc[1] # choose first row
b=a.loc[:, "Voltage(V)"] # save voltage in first row
c=ta95.iloc[-1] # choose last row
d=c.loc[:, "Voltage(V)"] # save voltage in last row
e=(b-d)*0.53 # get 53% decrease voltage
)
I don't know what should I do next for this script.
I appreciate your time and support if you can offer the help.
If you have any recommendation websites for me to read and help me solve this kind of problem. I do appreciate it, too. Thanks again.
I want to read a CSV File (filled in by temperature sensors) by python3.
Reading CSV File into array works fine. Printing a single cell by index fails. Please help for the right line of code.
This is the code.
import sys
import pandas as pd
import numpy as np
import array as ar
#Reading the CSV File
# date;seconds;Time;Aussen-Temp;Ruecklauf;Kessel;Vorlauf;Diff
# 20211019;0;20211019;12,9;24;22,1;24,8;0,800000000000001
# ...
# ... (2800 rows in total)
np = pd.read_csv('/var/log/LM92Temperature_U-20211019.csv',
header=0,
sep=";",
usecols=['date','seconds','Time','Aussen- Temp','Ruecklauf','Kessel','Vorlauf','Diff'])
br = np # works fine
print (br) # works fine - prints whole CSV Table :-) !
#-----------------------------------------------------------
# Now I want to print the element [2] [3] of the two dimensional "CSV" array ... How to manage that ?
print (br [2] [3]) # ... ends up with an error ...
# what is the correct coding needed now, please?
Thanks in advance & Regards
Give the name of the column, not the index:
print(br['Time'][3])
As an aside, you can read your data with only the following, and you may want decimal=',' as well:
import pandas as pd
br = pd.read_csv('/var/log/LM92Temperature_U-20211019.csv', sep=';', decimal=',')
print(br)
I am trying to run the below script to add to columns to the left of a file; however it keeps giving me
valueError: header must be integer or list of integers
Below is my code:
import pandas as pd
import numpy as np
read_file = pd.read_csv("/home/ex.csv",header='true')
df=pd.DataFrame(read_file)
def add_col(x):
df.insert(loc=0, column='Creation_DT', value=pd.to_datetime('today'))
df.insert(loc=1, column='Creation_By', value="Sean")
df.to_parquet("/home/sample.parquet")
add_col(df)
Any ways to make the creation_dt column a string?
According to pandas docs header is row number(s) to use as the column names, and the start of the data and must be int or list of int. So you have to pass header=0 to read_csv method.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
Also, pandas automatically creates dataframe from read file, you don't need to do it additionally. Use just
df = pd.read_csv("/home/ex.csv", header=0)
You can try:
import pandas as pd
import numpy as np
read_file = pd.read_csv("/home/ex.csv")
df=pd.DataFrame(read_file)
def add_col(x):
df.insert(loc=0, column='Creation_DT', value=str(pd.to_datetime('today')))
df.insert(loc=1, column='Creation_By', value="Sean")
df.to_parquet("/home/sample.parquet")
add_col(df)
I'm trying to write some very simple code, and essentially what I am doing is reading in two columns of time-series data (that correspond to slightly different time-bins), and looping through different % weights of the second column of data with itself in consecutive time bins. However, when I run this loop, for some reason the original dataframe (specifically, df['EST'] is somehow being changed by this line:
X_new[j-1]=Wt*X_temp[j-1]+(1-Wt)*X_temp[j]
I narrowed it down to this line of code, because when I eliminate it, it no longer makes changes to the initial dataframe. I don't understand how this line could be changing the original dataframe.
My complete code:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
import csv
import os
os.chdir("C://Users/XXX")
raw_data = open('correlation test.csv')
%matplotlib inline
import matplotlib.pyplot as plt
df=pd.read_csv(raw_data)
Y=df['%<30'][1:].reshape(-1,1)
X_new=df['EST'][1:].reshape(-1,1)
X_temp=df['EST'][1:].reshape(-1,1)
Wt=0
Best_Wt=Wt
Best_Score=1
for i in range(1,100):
for j in range(1,df.shape[0]-1):
X_new[j-1]=Wt*X_temp[j-1]+(1-Wt)*X_temp[j]
asdf=0
RR=LinearRegression()
RR.fit(X_new,Y)
New_Score=np.mean(np.abs((RR.predict(X_new)-Y)))
if New_Score<Best_Score:
Best_Score=New_Score
Best_Wt=Wt
print('New Best Score:',Best_Score)
print('New Best Weight:',Best_Wt)
Wt=Wt+0.01
The file that it pulls from is two columns of percentages, the first column is labeled '%<30' and the second column is labeled 'EST'
Thank you in advance for your help!