How to create an emtpy cell in csv if data not found - python-3.x

I'm trying to scrape some data from a website. It looks like this:
Person 1
Data 1
Data 2
Data 3
Data 5
Person 2
Data 2
Data 3
Data 7
I would like the output in csv to be like:
Person 1 Data 1 Data 2 Data 3 Data 4 Data 5 Data 6 Data 7
data data data data
Person 2 Data 1 Data 2 Data 3 Data 4 Data 5 Data 6 Data 7
data data data
However, I don't know how to force data output if data is missing. I know there might be some try, except (maybe).
I'm using python 3.6.7 and selenium. BTW in the above example 'data' is any value found for the Data1-Data7 entries.
I hope it is clear.

You need to simply check if the data is empty and if it is, replace the empty text with whatever
for row in csvreader:
try:
#your code for cells which aren't empty
except:
#your code for empty cells
You can also try something like this
This will fill all empty cells to eqaul 0
result.fillna(0, inplace=True)
EDIT--------
Since there is no code posted with your answer here is a basic example of how you could replace the empty cells
for cl in list_of_columns:
df[cl] = df[cl].replace(r'\s+', np.nan, regex=True)
df[cl] = df[cl].fillna(0)

Thanks to all that tried to help me. I found the solution:
Data=None
try:
Cell=elm.find_element_by_xpath(".//div[#class='whatever_the_class_name'][contains(.,'whatever_data')].text
except NoSuchElementException:
Data=' '
line.append(Data)
The penultimate line of code is the answer of my question. Well, the simpest things are (sometimes) the hardest ;)

Related

Reading an Excel file with united cells in Python

I have an excel table of the following type (the problem described below is driven by the presence of the united cells).
I am using read_excel from pandas to read it.
What I want: I would like to use the values in the first column as an index, and to have the values in the third column combined in one cell, e.g. like here.
What I get from directly applying read_excel can be seen here.
If needed: please see the code used to read the file below (I am reading it from google drive in google colab):
path = '/content/drive/MyDrive/ExampleFile.xlsx'
pd.read_excel(path, header = 0, index_col = 0)
Could you please help?
Please let me know if anything in the question is unclear.
here is one way to accomplish it. I created the xls similar to yours, the first column had a heading of sno
# fill the null values with values from previous rows
df=df.ffill()
# combine the rows where class is the same and create a new column
df=df.assign(comb=df.groupby(['class'])['type'].transform(lambda x: ','.join(x)))
# drop the duplicated rows
df2=df.drop_duplicates(subset=['class','comb'])[['class','comb']]
class comb
0 fruit apple,orange
2 toys car,truck,train

How to select a column from a text file which has no header using python

I have a text file which is tabulated. When I open the file in python using pandas, it shows me that the file contains only one column but there are many columns in it. I've tried using pd.DataFrames, sep= '\s*', sep= '\t', but I can't select the column since there is only one column. I've even tried specifying the header but the header moves to the exterior right side and specifies the whole file as one column only. I've also tried .loc method and mentioned specific column number but it always returns rows. I want to select the first column (A, A), third column (HIS, PRO) and fourth column (0, 0).
I want to get the above mentioned specific columns and print it in a CSV file.
Here is the code I have used along with some file components.
1) After opening the file using pd:
[599 rows x 1 columns]
2) The file format:
pdb_id: 1IHV
0 radii_filename: MD_threshold: 4
1 A 20 HIS 0 MaximumDistance
2 A 21 PRO 0 MaximumDistance
3 A 22 THR 0 MaximumDistance
Any help will be highly appreciated.
3) code:
import pandas as pd
df= pd.read_table("file_path.txt", sep= '\t')
U= df.loc[:][2:4]
Any help will be highly appreciated.
If anybody gets any file like this, it can be opened and the column can be selected using the following codes:
f=open('file.txt',"r")
lines=f.readlines()
result=[]
for x in lines:
result.append(x.split(' ')[range])
for w in result:
s='\t'.join(w)
print(s)
Where range is the column you want to select.

Inserting data into a particular column of an Excel sheet [duplicate]

I have a .mat file which contains titles={'time','data'} and 2 column vectors:
time=[1;2;3;4;5] and data=[10;20;30;40;50].
I created a new cell called table={'time','data';time data} and i used:
xlswrite(filename,table);
However, when i open the xlsx file it shows me only the titles and not showing the numbers.
I saw that xlswrite will show empty cell in case im trying to export more than 1 number in a cell.
Is there anything i can do to export the whole vector instead of writing each value in it's cell?
The final result that i tried to get is like this:
time data
1 10
2 20
3 30
4 40
5 50
You have a couple options. Usually what I do is break it into two xlswrite calls, one for the header and one for the data.
titles = {'time','data'};
time = [1;2;3;4;5];
data = [10;20;30;40;50];
xlswrite('myfile.xlsx', titles, 'Sheet1', 'A1');
xlswrite('myfile.xlsx', [time, data], 'Sheet1', 'A2');
Alternatively, if you have R2013b or newer you can also use the table builtin, which has its own method for writing out data. With the same sample data:
mytable = table(time, data, 'VariableNames', titles);
writetable(mytable, 'myfile.xlsx');

xlswrite in case of vectors

I have a .mat file which contains titles={'time','data'} and 2 column vectors:
time=[1;2;3;4;5] and data=[10;20;30;40;50].
I created a new cell called table={'time','data';time data} and i used:
xlswrite(filename,table);
However, when i open the xlsx file it shows me only the titles and not showing the numbers.
I saw that xlswrite will show empty cell in case im trying to export more than 1 number in a cell.
Is there anything i can do to export the whole vector instead of writing each value in it's cell?
The final result that i tried to get is like this:
time data
1 10
2 20
3 30
4 40
5 50
You have a couple options. Usually what I do is break it into two xlswrite calls, one for the header and one for the data.
titles = {'time','data'};
time = [1;2;3;4;5];
data = [10;20;30;40;50];
xlswrite('myfile.xlsx', titles, 'Sheet1', 'A1');
xlswrite('myfile.xlsx', [time, data], 'Sheet1', 'A2');
Alternatively, if you have R2013b or newer you can also use the table builtin, which has its own method for writing out data. With the same sample data:
mytable = table(time, data, 'VariableNames', titles);
writetable(mytable, 'myfile.xlsx');

can someone suggest an idea on printing blanks in an xls file?

still fairly new to matlab, picked up this data analysis code from someone and I had to add in new functions.
for one function I'm calculating the average of every 3 entries in one column and print the result on another column. so it would be something like this
1 -1
3 -1
5 =(1+3+5)/3
7 -1
1 -1
1 =(7+1+1)/3
4 -1
what I wish to do is to print a blank in the cells that have -1. my first thought was to just assign string values to my results instead of ints. this didn't work because I think there is a line of code in there somewhere that converts everything to ints.
another possible solution is just to reopen the file and loop through all cells replacing any -1's with blank strings, though I'm not sure how to do this, and it's inefficient.
as last resort, I guess I can always tell the user of this xls sheet to use the find/replace function in excel before processing it.
edit: partial code of the save part:
data = [data.time, data.avg_time'];
data2 = num2cell(data);
data3 = {'t', 'avg t'};
data = [data3; data2];
xlswrite([filename, '.xls'], data);
I misunderstood your question (i thought of replacing NaN's with -1, thanks Amro).
You can use this:
A(A(:,2)==-1,2)=NaN
where A is the matrix you created first.
Hope it helps you :)

Resources