How can I convert .mat files to NumPy files in Python? - python-3.x

So I have a .mat file
It is a little over 1 GB but I don't know how much data or lines of code is on it. I want to convert this .mat file to a NumPy file in Python so I can look at the data and see what is in it. How do I do this conversion?

I think you have two options to read it.
Reading it in python:
import scipy.io
mat = scipy.io.loadmat('fileName.mat')
Converting it to .csv in MATLAB in order to read it in python later:
FileData = load('FileName.mat');
csvwrite('FileName.csv', FileData.M);

Related

Python - read EBCIDIC encoded .dat file into a pandas df

I have a .dat file exported from a mainframe system. It is EBCIDIC encoded(cp037). I would like to load the contents into a pandas or spark dataframe.
I tried using "iconv" to convert the file to ascii, it does not support conversion from cp037. "iconv -l" does not list cp037.
What is the best way to achieve this?

How to load .gds file into Pandas?

I have a .gds file. How can I read that file with pandas and do some analysis? What is the best way to do that in Python? The file can be downloaded here.
you need to change the encoding and read the data using latin1
import pandas as pd
df = pd.read_csv('example.gds',header=27,encoding='latin1')
will get you the data file, also you need to skip the first 27 rows of data for the real pandas meat of the file.
The gdspy package comes handy for such applications. For example:
import numpy
import gdspy
gdsii = gdspy.GdsLibrary(infile="filename.gds")
main_cell = gdsii.top_level()[0] # Assume a single top level cell
points = main_cell.polygons[0].polygons[0]
for p in points:
print("Points: {}".format(p))

Write output in xlsb file format (Excel binary file format) using pandas and pyxlsb

I've read a lot of stackoverflow and other threads where it's been mentioned how to read excel binary file.
Reference: Read XLSB File in Pandas Python
import pandas as pd
df = pd.read_excel('path_to_file.xlsb', engine='pyxlsb')
However, I can not find any solution on how to write it back as .xlsb file after processing using pandas? Can anyone please suggest a workable solution for this using python?
Any help is much appreciated!
I haven't been able to find any solution to write into xlsb files or create xlsb files using python.
But maybe one work around is to save your file as xlsx using any of the many available libraries to do that (such as pandas, xlsxwriter, openpyxl) and then converting that file into a xlsb using xlsb-converter. https://github.com/gibz104/xlsb-converter
CAUTION: This repository uses WIN32COM, which is why this script only supports Windows
you can read binary file with open_workbook under pyxlsb. Please find below the code:
import pandas as pd
from pyxlsb import open_workbook
path=r'D:\path_to_file.xlsb'
df2=[]
with open_workbook(path) as wb:
with wb.get_sheet(1) as sheet:
for row in sheet.rows():
df2.append([item.v for item in row])
data= pd.DataFrame(df2[1:], columns=df2[0])

How to read a binary file and write it in a txt or csv file using python?

I have a binary file (.man), containing data that I want to read, using python 3.7. The idea is to convert this binary file into a txt or a csv file.
I know the total number of values in the binary file but not the number of bytes per value.
I have red many post talking about binary file but none was helpful...
Thank you in advance,
Simply put, yes.
with open('file.man', 'rb') as f:
data = f.readlines()
print(data) # binary values represented as string
Opening a file with the optimal parameter 'rb' means that it will read a binary file and translate it to ASCII (abstracted) for you.
The solution I found is that:
import struct
import numpy as np
data =[]
with open('binary_file', "rb") as f:
while len(data)<length(binary_file):
data.extend([struct.unpack('f',f.read(4))])
Of course, this works because I know that the encoding is simple precision.

read_csv one file from several files in a gzip?

I have several files in my tar.gz zip file. I want to read only one of them into a pandas data frame. Is there any way to do that?
Pandas can read a file inside a gz. But seems like there is no way to tell it specifically read one of them if there are several files inside the gz.
Would appreciate any thoughts.
Babak
To read a specific file in any compressed folder we just need to give its name or position for e.g to read a specific csv file in a zipped folder we can just open that file and read the content.
from zipfile import ZipFile
import pandas as pd
# opening the zip file in READ mode
with ZipFile("results.zip") as z:
read = pd.read_csv(z.open(z.infolist()[2].filename))
print(read)
Here the folder structure of results looks like and I want to read test.csv :
$ data_description.txt sample_submission.csv test.csv train.csv
If you use pardata, you can do this in one line:
import pardata
data = pardata.load_dataset_from_location('path-to-zip.zip')['table/csv']
The returned data variable should be a dictionary of all csv files in the zip archive.
Disclaimer: I'm one of the main co-authors of pardata.

Resources