How I can displace header of data in python.pandas? - python-3.x

I have a problem with opening a .prn file using pandas. My file contains spaces at the end of lines which causes a shift in the headers. Below is an example of what I need. How can I get the behaviour I want?
import pandas as pd
filename='D:/EXP/7head+2cup 60kHz/k=15 I=80 mA 20 kg/1p.prn'
df=pd.read_csv(filename, sep='\t', header=[0])
a = list(df['Temp'])
print(df)
print(a)
Input:
Expected output:

Related

How to get specific data from excel

Any idea on how can I acccess or get the box data (see image) under TI_Binning tab of excel file using python? What module or similar code you can recommend to me? I just need those specifica data and append it on other file such as .txt file.
Getting the data you circled:
import pandas as pd
df = pd.read_excel('yourfilepath', 'TI_Binning', skiprows=2)
df = df[['Number', 'Name']]
To appending to an existing text file:
import numpy as np
with open("filetoappenddata.txt", "ab") as f:
np.savetxt(f, df.values)
More info here on np.savetxt for formats to fit your output need.

Reading Large CSV file using pandas

I have to read and analyse a logging file from CAN which is in CSV format. It has 161180 rows and I'm only separating 566 columns with semicolon. This is the code I have.
import csv
import dtale
import pandas as pd
path = 'C:\Thesis\Log_Files\InputOutput\Input\Test_Log.csv'
raw_data = pd.read_csv(path,engine="python",chunksize = 1000000, sep=";")
df = pd.DataFrame(raw_data)
#df
dtale.show(df)
I have following error when I run the code in Jupyter Notebook and it's encountering with below error message. Please help me with this. Thanks in advance!
MemoryError: Unable to allocate 348. MiB for an array with shape (161180, 566) and data type object
import time
import pandas as pd
import csv
import dtale
chunk_size = 1000000
batch_no=1
for chunk in pd.read_csv("C:\Thesis\Log_Files\InputOutput\Input\Book2.csv",chunksize=chunk_size,sep=";"):
chunk.to_csv('chunk'+str(batch_no)+'.csv', index=False)
batch_no+=1
df1 = pd.read_csv('chunk1.csv')
df1
dtale.show(df1)
I used above code for only 10 rows and 566 columns. Then it's working. If I consider all the rows (161180), it's not working. Could anyone help me with this. Thanks in advance!
I have attached the output here
You are running out of RAM when loading in the datafile. Best option is to split the file into chunks and read in chunks of files
To read the first 999,999 (non-header) rows:
read_csv(..., nrows=999999)
If you want to read rows 1,000,000 ... 1,999,999
read_csv(..., skiprows=1000000, nrows=999999)
You'll probably also want to use chunksize:
This returns a TextFileReader object for iteration:
chunksize = 10 ** 6
for chunk in pd.read_csv(filename, chunksize=chunksize):
process(chunk)

python3 - after successfully reading CSV file into 2D Arrary - how to print a single cell by index?

I want to read a CSV File (filled in by temperature sensors) by python3.
Reading CSV File into array works fine. Printing a single cell by index fails. Please help for the right line of code.
This is the code.
import sys
import pandas as pd
import numpy as np
import array as ar
#Reading the CSV File
# date;seconds;Time;Aussen-Temp;Ruecklauf;Kessel;Vorlauf;Diff
# 20211019;0;20211019;12,9;24;22,1;24,8;0,800000000000001
# ...
# ... (2800 rows in total)
np = pd.read_csv('/var/log/LM92Temperature_U-20211019.csv',
header=0,
sep=";",
usecols=['date','seconds','Time','Aussen- Temp','Ruecklauf','Kessel','Vorlauf','Diff'])
br = np # works fine
print (br) # works fine - prints whole CSV Table :-) !
#-----------------------------------------------------------
# Now I want to print the element [2] [3] of the two dimensional "CSV" array ... How to manage that ?
print (br [2] [3]) # ... ends up with an error ...
# what is the correct coding needed now, please?
Thanks in advance & Regards
Give the name of the column, not the index:
print(br['Time'][3])
As an aside, you can read your data with only the following, and you may want decimal=',' as well:
import pandas as pd
br = pd.read_csv('/var/log/LM92Temperature_U-20211019.csv', sep=';', decimal=',')
print(br)

how to solve the keyerror when I load a CSV file using pandas

I use pandas to load a csv file and want to print out data of row, here is original data
orginal data
I want to print out 'violence' data for make a bar chart, but it occuar a keyerror, here is my code
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
c_data=pd.read_csv('crime.csv')
print(c_data.head())
print (c_data['violence'])
and the error
error detail
error detail
I tried use capital VIOLENCE,print (c_data['VIOLENCE']),but also failed
error detail
error detail
can someone tell me how to work it out?
Try the following if your data is small:
with open('crime.csv', 'r') as my_file:
reader = csv.reader(my_file)
rows = list(reader)
print rows[3]
If your data is big, try this:
from itertools import islice
with open('crime.csv', 'r') as my_file:
reader = csv.reader(my_file)
print next(islice(reader, 3, 4))

Pandas puts all data from a row in one column

I have another problem with csv. I am using pandas to remove duplicates from a csv file. After doing so I noticed that all data has been put in one column (preprocessed data has been contained in 9 columns). How to avoid that?
Here is the data sample:
39,43,197,311,112,88,47,36,Label_1
Here is the function:
import pandas as pd
def clear_duplicates():
df = pd.read_csv("own_test.csv", sep="\n")
df.drop_duplicates(subset=None, inplace=True)
df.to_csv("own_test.csv", index=False)
Remove sep, because default separator is , in read_csv:
def clear_duplicates():
df = pd.read_csv("own_test.csv")
df.drop_duplicates(inplace=True)
df.to_csv("own_test.csv", index=False)
Maybe not so nice, but works too:
pd.read_csv("own_test.csv").drop_duplicates().to_csv("own_test.csv", index=False)

Resources