Making a vector out of excel columns using python - excel

everyone...
I just started on python a couple of days ago because I require to handle some excel data in order to automatically update the data of certain cells from one file into another.
However, I'm kind of stuck since I have barely programmed before, and it's my first time using python as well, but my job required me to find a solution and I'm trying to make it work even though it's not my field of expertise.
I used the "xlrd library", imported my file and managed to print the columns I'm needing... However, I can't find a way to put those columns into a matrix in order to handle the data like this:
Matrix =[DataColumnA DataColumnG DataColumnH] in the size [nrows x 3]
As for now, I have 3 different outputs for the 3 different columns I need, but I'm trying to join them together into one big matrix.
So far my code looks like this:
import xlrd
workbook = xlrd.open_workbook("190219_serviciosWRAmanualV5.xls");
worksheet = workbook.sheet_by_name("ServiciosDWDM");
workbook2 = xlrd.open_workbook("Potencia2.xlsx");
worksheet2 = workbook2.sheet_by_name("Hoja1");
filas = worksheet.nrows
filas2 = worksheet2.nrows
columnas = worksheet.ncols
for row in range (2, filas):
Equipo_A = worksheet.cell(row,12).value
Client_A = worksheet.cell(row,13).value
Line_A = worksheet.cell(row, 14).value
print (Equipo_A, Line_A, Client_A)
So I have only gotten, as mentioned above, the data in the columns which is what I'm printing which you can see.
What I'm trying to do, or the main thing I need to do is to read the cell of the first row in Column A and look for it in the other excel file... if the names match, I would have to validate that for the same row (in file 1) the data in both the ColumnG and ColumnH is the same as the data in the second file.
If they match I would have to update Column J in the first file with the data from the second file.
My other approach is to retrieve the value of the cell in ColumnA and look for it in the column A of the second file, then I would make an if conditional to see if ColumnsG and H are equal to Column C of 2nd file and so on...
The thing here is, I have no idea how to pin point the position of the cell and extract the data to make the conditional for this second approach.
I'm not sure if by making that matrix my approach is okay or if the second way is better, so any suggestion would be absolutely appreciated.
Thank you in advance!

Related

Reading an Excel file with united cells in Python

I have an excel table of the following type (the problem described below is driven by the presence of the united cells).
I am using read_excel from pandas to read it.
What I want: I would like to use the values in the first column as an index, and to have the values in the third column combined in one cell, e.g. like here.
What I get from directly applying read_excel can be seen here.
If needed: please see the code used to read the file below (I am reading it from google drive in google colab):
path = '/content/drive/MyDrive/ExampleFile.xlsx'
pd.read_excel(path, header = 0, index_col = 0)
Could you please help?
Please let me know if anything in the question is unclear.
here is one way to accomplish it. I created the xls similar to yours, the first column had a heading of sno
# fill the null values with values from previous rows
df=df.ffill()
# combine the rows where class is the same and create a new column
df=df.assign(comb=df.groupby(['class'])['type'].transform(lambda x: ','.join(x)))
# drop the duplicated rows
df2=df.drop_duplicates(subset=['class','comb'])[['class','comb']]
class comb
0 fruit apple,orange
2 toys car,truck,train

Secifying a common range for xlsread function in Matlab

I am trying to figure out how to specify a common range for xlsread() function in matlab.
Usually I use n=xlsread('filename','#sheet','A1:A10'), but I have quite a bit of data in the same sheet and I'd like to know if I can specify it with one range, i.e . if all my data is between '1:10', I want to specify 1:10 as range, and only call the letter values of each column.
I was thinking to do it as follows:
function [a,b,c]=getdata(filename,'1:10')
a=xlsread(filename,1,'A:A'???)
b=xlsread(filename,1,'B:B'???)
c=xlsread(filename,1,'C:C'???)
end
After some research I could not find any information as to how this is done.
Thanks in advance,
Greg
If you want to read 1 to 10 rows of column A, use:
data = xlsread(filename, 1, 'A1:A10');
If you want to read 1 to 10 rows of all columns, use:
data = xlsread(filename, 1, '1:10');
If you want to read 1 to 10 rows of, say, first three columns A, B, and C, use:
data = xlsread(filename, 1, 'A1:C10');
Using dynamic variable names is always a bad idea. Read this for explanation. But if you still want to create a, b, and c and so on depending on the number of columns in the Excel file, you can use:
for k=1:size(data,2)
assignin('caller', char(96+k), data(:,k)); %or char(64+k) for block letters
end
The above will work if number of columns are less than or equal to 26. This may only be feasible if you're dealing with a few columns. But I still recommend to avoid it.

Import data from excel to matlab row by row

I need to know if there is any function that can import data from excel row by row?
I used to work with xlsread but it won't work for this case unless i use it in a function that takes all the columns and group all the element in the same row together...
Edit: I was able to do it using simple xlsread by the following code:
num = xlsread(excel_file,'B2:BI174');
row1=num(1:173:end);
It is tempting to read the data one row at a time, but that means you will waste time due to file access overhead. It's a lot faster to read all at once and re-pack into a cell array:
allData = xlsread('filename.xls');
oneRowPerElementCell = mat2cell(allData, ones(size(allData,1),1), size(allData,2));
Read xlsread documentation here to read a block from excel file.
Example: To read the first row from 1st to 26th coulmn use,
row1 = xlsread('filename.xlsx',sheet_no,'A1:Z1');

script task in SSIS to import excel spreadsheet

I have reviewed the questions that may have had my answer and unfortunately they don't seem to apply. Here is my situation. I have to import worksheets from my client. In columns A, C, D, and AA the client has the information I need. The balance of the columns have what to me is worthless information. The column headers are consistent in the four columns I need, but are very inconsistent in the columns that don't matter. For example cell A1 contains Division. This is true across all of the spreadsheets. Cell B1 can contain anything from sleeve length to overall length to fit. What I need to do is to import only the columns I need and map them to an SQL 2008 R2 table. I have defined the table in a stored procedure which is currently calling an SSIS function.
The problem is that when I try to import a spreadsheet that has different column names the SSIS fails and I have to go back in an run it manually to get the fields set up right.
I cannot imagine that what I am trying to do has not been done before. Just so the magnitude is not lost, I have 170 users who have over 120 different spreadsheet templates.
I am desperate for a workable solution. I can do everything after getting the file into my table in SQL. I have even written the code to move the files back to the FTP server.
I put together a post describing how I've used a Script task to parse Excel. It's allowe me to import decidedly non-tabular data into a data flow.
The core concept is that you will use a the JET or ACE provider and simply query the data out of an Excel Worksheet/named range. Once you have that, you have a dataset you can walk through row-by-row and perform whatever logic you need. In your case, you can skip row 1 for the header and then only import columns A, C, D and AA.
That logic would go in the ExcelParser class. So, the Foreach loop on line 71 would probably be distilled down to something like (code approximate)
// This gets the value of column A
current = dr[0].ToString();
// this assigns the value of current into our output row at column 0
newRow[0] = current;
// This gets the value of column C
current = dr[2].ToString();
// this assigns the value of current into our output row at column 1
newRow[1] = current;
// This gets the value of column D
current = dr[3].ToString();
// this assigns the value of current into our output row at column 2
newRow[2] = current;
// This gets the value of column AA
current = dr[26].ToString();
// this assigns the value of current into our output row at column 3
newRow[3] = current;
You obviously might need to do type conversions and such here but that's core of the parsing logic.

can someone suggest an idea on printing blanks in an xls file?

still fairly new to matlab, picked up this data analysis code from someone and I had to add in new functions.
for one function I'm calculating the average of every 3 entries in one column and print the result on another column. so it would be something like this
1 -1
3 -1
5 =(1+3+5)/3
7 -1
1 -1
1 =(7+1+1)/3
4 -1
what I wish to do is to print a blank in the cells that have -1. my first thought was to just assign string values to my results instead of ints. this didn't work because I think there is a line of code in there somewhere that converts everything to ints.
another possible solution is just to reopen the file and loop through all cells replacing any -1's with blank strings, though I'm not sure how to do this, and it's inefficient.
as last resort, I guess I can always tell the user of this xls sheet to use the find/replace function in excel before processing it.
edit: partial code of the save part:
data = [data.time, data.avg_time'];
data2 = num2cell(data);
data3 = {'t', 'avg t'};
data = [data3; data2];
xlswrite([filename, '.xls'], data);
I misunderstood your question (i thought of replacing NaN's with -1, thanks Amro).
You can use this:
A(A(:,2)==-1,2)=NaN
where A is the matrix you created first.
Hope it helps you :)

Resources