Python script that reads csv files - python-3.x

script that reads CSV files and gets headers and filter by specific column, I have tried researching on it but nothing of quality I have managed to get.
Please any help will be deeply appreciated

There's a standard csv library included with Python.
https://docs.python.org/3/library/csv.html
It will automatically create a dictionary of arrays where the first row in the CSV determines the keys in the dict.

You can also follow pandas.read_csv for the same.

Related

how to read a specific text file in pandas

I want to read a specific line in a csv file in pandas on python.
Here is the structure of the file :
file :
example
how would be the best way to fill the values into a dataframe, with the correct name of the parameters?
thanks for help
Possible methods:
pandas.read_table method seems to be a good way to read (also in chunks) a tabular data file
doc: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_table.html
pandas has a good fast (compiled) csv reader pandas.read_csv (may be more than one).
doc: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
Ref Link: https://codereview.stackexchange.com/questions/152194/reading-from-a-txt-file-to-a-pandas-dataframe

.csv is empty after reading it with pd.read_csv()

After running
df = pd.read_csv('my_file.csv'),
my original .csv file goes blank. Is there a way to read the .csv data without emptying the original file?
pd.read_csv() does not modify the file!
Here, the file before using pd.read_csv():
Using it:
And now if we check it again, the file hasn't changed (as expected):
So the problem isn't with pd.read_csv(). I would assume that you have other code that's messing things up. Take a look and tell us, so we can help you better.

Difficulty with encoding while reading data in Spark

In connection with my earlier question, when I give the command,
filePath = sc.textFile("/user/cloudera/input/Hin*/datafile.txt")
filePath.collect()
some part of the data has '\xa0' prefixed to every word, and other part of the data doesn't have that special character. I am attaching 2 pictures, one with '\xa0', and another without '\xa0'. The content shown in 2 pictures belong to same file. Only some part of the data from same file is read that way by Spark. I have checked the original data file present in HDFS, and it was problem free.
I feel that it has something to do with encoding. I tried all methods like using replaceoption in flatMap like flatMap(lambda line: line.replace(u'\xa0', ' ').split(" ")), flatMap(lambda line: line.replace(u'\xa0', u' ').split(" ")), but none worked for me. This question might sound dump, but I am newbie in using Apache Spark, and I require some assistance to overcome this problem.
Can anyone please help me? Thanks in advance.
Check the encoding of your file. When you use sc.textFile, spark expects an UTF-8 encoded file.
One of the solution is to acquire your file with sc.binaryFiles and then apply the expected encoding.
sc.binaryFile create a key/value rdd where key is the path to file and value is the content as a byte.
If you need to keep only the text and apply an decoding function, :
filePath = sc.binaryFile("/user/cloudera/input/Hin*/datafile.txt")
filePath.map(lambda x :x[1].decode('utf-8')) #or another encoding depending on your file

Compare CSV file data using Nodejs

I want to compare the data in two .csv files.Have to compare the updated data between these two .csv file using nodejs.
Is ther any possibilities to do it in Nodejs.
Thanks,I am very newbie to this.
It will be easiest using one of the following modules:
https://www.npmjs.com/package/csv
https://www.npmjs.com/package/tsv
or other that you find in:
https://www.npmjs.com/browse/keyword/csv
https://www.npmjs.com/browse/keyword/tsv
(don't worry if it's CSV or TSV - just make sure that you use the correct delimiter which is comma in your case).

Csv to csv (XSLT)

We have to perform a CSV transformation into another CSV (1 file to 1 file). We are looking for a cheap solution. The first idea that popped into my mind was Excel, but the file will be to big.
1) Is it possible to do a CSV to CSV conversion through XSLT? I can't seem to find a tool or google result which tells me how I could possibly do it.
2) Is there a better approach to do CSV transformations?
Edit:
It should be possible to automate/schedule the process
My answers below
1) No, XSLT only transforms XML files.
2) Yes, as the answer to question 1 is "No", it is reasonable to assert there are better approaches. As CSV is not a standardised format there are a plethora of varied approaches to choose from.
Use Rscript to automate the transformation of CSV:
# Rscript --vanilla myscript.R
Where myscript.R is something like:
csv <- read.csv(file="input.csv",head=TRUE,sep=",")
# Modify your CSV ...
write.csv(data, file = "output.csv")

Resources