Python - read EBCIDIC encoded .dat file into a pandas df

Python - read EBCIDIC encoded .dat file into a pandas df - python-3.x

I have a .dat file exported from a mainframe system. It is EBCIDIC encoded(cp037). I would like to load the contents into a pandas or spark dataframe.
I tried using "iconv" to convert the file to ascii, it does not support conversion from cp037. "iconv -l" does not list cp037.
What is the best way to achieve this?

Related

Write output in xlsb file format (Excel binary file format) using pandas and pyxlsb

I've read a lot of stackoverflow and other threads where it's been mentioned how to read excel binary file.
Reference: Read XLSB File in Pandas Python
import pandas as pd
df = pd.read_excel('path_to_file.xlsb', engine='pyxlsb')
However, I can not find any solution on how to write it back as .xlsb file after processing using pandas? Can anyone please suggest a workable solution for this using python?
Any help is much appreciated!

I haven't been able to find any solution to write into xlsb files or create xlsb files using python.
But maybe one work around is to save your file as xlsx using any of the many available libraries to do that (such as pandas, xlsxwriter, openpyxl) and then converting that file into a xlsb using xlsb-converter. https://github.com/gibz104/xlsb-converter
CAUTION: This repository uses WIN32COM, which is why this script only supports Windows

you can read binary file with open_workbook under pyxlsb. Please find below the code:
import pandas as pd
from pyxlsb import open_workbook
path=r'D:\path_to_file.xlsb'
df2=[]
with open_workbook(path) as wb:
with wb.get_sheet(1) as sheet:
for row in sheet.rows():
df2.append([item.v for item in row])
data= pd.DataFrame(df2[1:], columns=df2[0])

how to read a specific text file in pandas

I want to read a specific line in a csv file in pandas on python.
Here is the structure of the file :
file :
example
how would be the best way to fill the values into a dataframe, with the correct name of the parameters?
thanks for help

Possible methods:
pandas.read_table method seems to be a good way to read (also in chunks) a tabular data file
doc: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_table.html
pandas has a good fast (compiled) csv reader pandas.read_csv (may be more than one).
doc: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
Ref Link: https://codereview.stackexchange.com/questions/152194/reading-from-a-txt-file-to-a-pandas-dataframe

How can I convert .mat files to NumPy files in Python?

So I have a .mat file
It is a little over 1 GB but I don't know how much data or lines of code is on it. I want to convert this .mat file to a NumPy file in Python so I can look at the data and see what is in it. How do I do this conversion?

I think you have two options to read it.
Reading it in python:
import scipy.io
mat = scipy.io.loadmat('fileName.mat')
Converting it to .csv in MATLAB in order to read it in python later:
FileData = load('FileName.mat');
csvwrite('FileName.csv', FileData.M);

How to Generate a csv file in python 3.7

I have a problem where I have to fetch the data through API calls using Python. I have the data. Now I have to convert it to a csv file.
We do not have to use numpy or panda. We can only use "Import collections" to generate csv file.
I am a beginner in python. Can someone help me with that please?

Which file formats can I save a pyspark dataframe as?

I would like to save a huge pyspark dataframe as a Hive table. How can I do this efficiently? I am looking to use saveAsTable(name, format=None, mode=None, partitionBy=None, **options) from pyspark.sql.DataFrameWriter.saveAsTable.
# Let's say I have my dataframe, my_df
# Am I able to do the following?
my_df.saveAsTable('my_table')
My question is which formats are available for me to use and where can I find this information for myself? Is OrcSerDe an option? I am still learning about this. Thank you.

Following file formats are supported.
text
csv
ldap
json
parquet
orc
Referece: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala

So I was able to write the pyspark dataframe to a compressed Hive table by using a pyspark.sql.DataFrameWriter. To do this I had to do something like the following:
my_df.write.orc('my_file_path')
That did the trick.
https://spark.apache.org/docs/1.6.0/api/python/pyspark.sql.html#pyspark.sql.DataFrame.write
I am using pyspark 1.6.0 btw

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Python - read EBCIDIC encoded .dat file into a pandas df - python-3.x

Related

Write output in xlsb file format (Excel binary file format) using pandas and pyxlsb

how to read a specific text file in pandas

How can I convert .mat files to NumPy files in Python?

How to Generate a csv file in python 3.7

Which file formats can I save a pyspark dataframe as?

Categories

Resources