I have a dataframe like below:
import pandas as pd
data = {'Words':['actually','he','came','from','home','and','played'],
'Col2':['2','0','0','0','1','0','3']}
data = pd.DataFrame(data)
The dataframe looks like this:
I write this dataframe into the drive using below command:
np.savetxt('/folder/file.txt', data.values,fmt='%s', delimiter='\t')
And the next script reads it with below line of code:
data = load_file('/folder/file.txt')
Below is load_file function to read a text file.
def load_file(filename):
with open(filename, 'r', encoding='utf-8') as f:
data = f.readlines()
return data
The data will be a tab separated list.
print(data)
gives me the following output:
['actually\t2\n', 'he\t0\n', 'came\t0\n', 'from\t0\n', 'home\t1\n', 'and\t0\n', 'played\t3\n']
I dont want to write the file to drive and then read it for processing. Instead I want to convert the dataframe to a tab separated list and process directly. How can I achieve this?
I checked for existing answers, but most just convert list to dataframe and not other way around.
Thanks in advance.
Try using .to_csv()
df_list = data.to_csv(header=None, index=False, sep='\t').split('\n')
df_list:
['actually\t2',
'he\t0',
'came\t0',
'from\t0',
'home\t1',
'and\t0',
'played\t3'
]
v = df.to_csv(header=None, index=False, sep='\t').rstrip().replace('\n', '\n\\n').split('\\n')
df_list:
['actually\t2\n',
'he\t0\n',
'came\t0\n',
'from\t0\n',
'home\t1\n',
'and\t0\n',
'played\t3\n'
]
I think this achieves the same result without writing to the drive:
df_list = list(data.apply(lambda row: row['Words'] + '\t' + row['Col2'] + '\n', axis=1))
Try:
data.apply("\t".join, axis=1).tolist()
Related
I want to create dataframe form existing lists( each row of file will be written in row dataframe.
with open(filename, mode='r', encoding='cp1252') as f:
lines=f.readlines()
liste1 = str(lines[0])
df1 = pd.DataFrame(liste1)
who can help me please?
below the 3 first rows of file f1.
[‘x1’, ‘major’, ’1198’, ‘TCP’]
[‘x1’, ‘minor’, ‘1198’, ‘UDP’]
[‘x2’, ‘major’, ’1198’, ‘UDP’]
If I understand this properly, want each row in the DataFrame to be a string you read from a line in the file?
Note that liste in your case is a string so I am not sure what you are going for.
This approach should work anyways.
import pandas as pd
df1 = pd.DataFrame()
with open(filename, mode='r', encoding='cp1252') as f:
lines=f.readlines()
liste1 = str(lines[0])
df1 = df1.append(pd.Series(liste1), ignore_index=True)
So if liste1 has form
> "This is a string"
then your DataFrame will look like this
df1.head()
0
0 This is a string
if liste1 has form
> ["This", "is", "a", "list"]
then your DataFrame will look like this
df1.head()
0 1 2 3
0 This is a list
You can then call this append() routine as many times as you want inside a loop.
However, I suspect that there is a function, such as pd.read_table(), that can do this all for you automatically (as #jezrael suggested in the comments to your question).
I have lots of big data csv files in terms of countries and I want to merge their column in a single csv file, furthermore, each file has 'Year' as an index and having same in terms of length and numbers. You can see below is a given example of a Japan.csv file.
If anyone can help me please let me know. Thank you!!
Try using:
import pandas as pd
import glob
l = []
path = 'path/to/directory/'
csvs = glob.glob(path + "/*.csv")
for i in csvs:
df = pd.read_csv(i, index_col=None, header=0)
l.append(df)
df = pd.concat(l, ignore_index=True)
This should work. It goes over each file name, reads it and combines everything into one df. You can export this df to csv or do whatever with it. gl.
import pandas as pd
def combine_csvs_into_one_df(names_of_files):
one_big_df = pd.DataFrame()
for file in names_of_files:
try:
content = pd.read_csv(file)
except PermissionError:
print (file,"was not found")
continue
one_big_df = pd.concat([one_big_df,content])
print (file," added!")
print ("------")
print ("Finished")
return one_big_df
I am writing the results of a query to a CSV file. However I am looking to add custom header (H|10|27) and footer (F|<row_count>)
I read related posts at SO but i couldn't find anything specific to python and pandas. Neither the documentation refers to this.
I am not sure how to go about that:
My code:
cs = connect_snowflake().cursor()
try:
cs.execute("select * from <Tables> where id in (20, 24,61);")
datas = cs.fetchall()
df = pd.DataFrame(datas)
print(df.head(10))
df.to_csv('P202461.csv', sep='|', header=False)
finally:
cs.close()
The footer here should be the total count of rows. Which I can fetch in a separate variable and pass it.
Any help would be appreciated.
Doesn't seem like you have a real need for pandas in this case.
Try the standard csv module.
Something like this should work:
import csv
def output_query_to_csv(query, filename='P202461.csv):
with connect_snowflake() as conn, open(filename, 'w', encoding='utf8) as csv_out:
cs = conn.cursor()
cs.execute(query)
datas = cs.fetchall()
writer = csv.writer(csv_out, delimiter='|')
header = ('H', '10', '27')
footer = ('F', len(datas))
writer.writerow(header)
writer.writerows(datas)
writer.writerow(footer)
I am trying to import data from a large csv file 15GB+. I have to select few columns with specific values (there are over 50 columns) but as an example. I have used
df=pd.read_csv('filename.csv', nrows=10000, usecols=['ID', State'])
Is there a way where I can specify something like that:
df=pd.read_csv('filename.csv', nrows=10000, usecols=['ID', 'State'='abc'])
Can't find any option to do that
There's no option to filter rows like that while reading csv files.
What you can do is create an iterator then apply your filter to each chunk then concat the chunks. It would look something like:
iterable = pd.read_csv('filename.csv', usecols=['ID', 'State'], iterator=True, chunksize=10000)
df = pd.concat([chunk[chunk['State'] == 'abc'] for chunk in iterable])
Assuming that the resulting DataFrame for a selection where 'State' == 'abc' is small enough to be accommodated in RAM, you could extract those from the csv as follows. df is the resultant DataFrame.
import pandas as pd
inPath = 'filename.csv'
chunkSize = 10000 #size of chunks relies on your available memory
tmpDf = pd.read_csv(inPath,chunksize=chunkSize,
usecols=['ID', 'State'])
for chunk in tmpDf:
try:
df
except NameError:
df = tmpDf[tmpDf['State'] == 'abc']
else:
df = pd.concat([df, tmpDf[tmpDf['State'] == 'abc']])
I tried to convert text file content into a .csv format by reading each and every line using python csv module and converting that to a list. But i couldn't get the expected output and it stores the first line in a row but second line will be stored in 3rd row and 5th so on. Since I am new to python i don't know how to skip the line and store it in the right order.
def FileConversion():
try:
with open('TextToCSV.txt', 'r') as textFile:
LineStripped = (eachLine.strip() for eachLine in textFile)
lines = (eachLine.split(" ") for eachLine in LineStripped if eachLine)
with open('finalReport.csv', 'w') as CSVFile:
writer = csv.writer(CSVFile)
writer.writerow(('firstName', 'secondName', 'designation', "age"))
writer.writerows(lines)
Why don't you try doing something more simple:
import pandas as pd
aux = pd.read_csv("TextToCSV.txt", sep=" ")
aux.columns=['firstName', 'secondName', 'designation', "age"]
aux.to_csv("result.csv")