Import data from excel to matlab row by row - excel

I need to know if there is any function that can import data from excel row by row?
I used to work with xlsread but it won't work for this case unless i use it in a function that takes all the columns and group all the element in the same row together...
Edit: I was able to do it using simple xlsread by the following code:
num = xlsread(excel_file,'B2:BI174');
row1=num(1:173:end);

It is tempting to read the data one row at a time, but that means you will waste time due to file access overhead. It's a lot faster to read all at once and re-pack into a cell array:
allData = xlsread('filename.xls');
oneRowPerElementCell = mat2cell(allData, ones(size(allData,1),1), size(allData,2));

Read xlsread documentation here to read a block from excel file.
Example: To read the first row from 1st to 26th coulmn use,
row1 = xlsread('filename.xlsx',sheet_no,'A1:Z1');

Related

How to collect cell values in excel and make them into one column

I'm brand new to coding and to this forum, so please accept my apologies in advance for being a newbie and probably not understanding what i'm supposed to say!
I was asked a question which I didn't know how to approach earlier. The user was trying to collect cell values in multiple rows from Excel (split out by a delimiter) and then create one complete column of single values in rows. Example in picture1 below. Source file is how the data is received and output is what the user is trying to do with it:
I hope I have explained that correctly. I'm looking for some python code that will automate it. There could be thousands of values that need putting into rows
Thanks in advance!
Andy
Have a look at the openpyxl package:
https://openpyxl.readthedocs.io/en/stable/index.html
This allows you to directly access cells in your excel sheet within python.
As some of your cells seem to contain multiple values separated by semicolons you could read the cells as strings and use the
splitstring = somelongstring.split(';')
to seperate the values. This results in a list containing the separated values
Basic manipulations using this package are described in this tutorial:
https://openpyxl.readthedocs.io/en/stable/tutorial.html
Edit:
An example iterating over all columns in a worksheet would be:
from openpyxl import load_workbook
wb = load_workbook('test.xlsx')
for row in wb.iter_cols(values_only=True):
for value in row:
do_something(value)
I was able to find some code online and butcher is to get what I needed. Here is the code I ended up with:
import pandas as pd
iris = pd.read_csv('iris.csv')
from itertools import chain
# return list from series of comma-separated strings
def chainer(s):
return list(chain.from_iterable(s.str.split(',')))
# calculate lengths of splits
lens = iris['Order No'].str.split(',').map(len)
# create new dataframe, repeating or chaining as appropriate
res = pd.DataFrame({'Order No': np.repeat(iris['Order No'], lens),'Order No': chainer(iris['Order No'])})

Making a vector out of excel columns using python

everyone...
I just started on python a couple of days ago because I require to handle some excel data in order to automatically update the data of certain cells from one file into another.
However, I'm kind of stuck since I have barely programmed before, and it's my first time using python as well, but my job required me to find a solution and I'm trying to make it work even though it's not my field of expertise.
I used the "xlrd library", imported my file and managed to print the columns I'm needing... However, I can't find a way to put those columns into a matrix in order to handle the data like this:
Matrix =[DataColumnA DataColumnG DataColumnH] in the size [nrows x 3]
As for now, I have 3 different outputs for the 3 different columns I need, but I'm trying to join them together into one big matrix.
So far my code looks like this:
import xlrd
workbook = xlrd.open_workbook("190219_serviciosWRAmanualV5.xls");
worksheet = workbook.sheet_by_name("ServiciosDWDM");
workbook2 = xlrd.open_workbook("Potencia2.xlsx");
worksheet2 = workbook2.sheet_by_name("Hoja1");
filas = worksheet.nrows
filas2 = worksheet2.nrows
columnas = worksheet.ncols
for row in range (2, filas):
Equipo_A = worksheet.cell(row,12).value
Client_A = worksheet.cell(row,13).value
Line_A = worksheet.cell(row, 14).value
print (Equipo_A, Line_A, Client_A)
So I have only gotten, as mentioned above, the data in the columns which is what I'm printing which you can see.
What I'm trying to do, or the main thing I need to do is to read the cell of the first row in Column A and look for it in the other excel file... if the names match, I would have to validate that for the same row (in file 1) the data in both the ColumnG and ColumnH is the same as the data in the second file.
If they match I would have to update Column J in the first file with the data from the second file.
My other approach is to retrieve the value of the cell in ColumnA and look for it in the column A of the second file, then I would make an if conditional to see if ColumnsG and H are equal to Column C of 2nd file and so on...
The thing here is, I have no idea how to pin point the position of the cell and extract the data to make the conditional for this second approach.
I'm not sure if by making that matrix my approach is okay or if the second way is better, so any suggestion would be absolutely appreciated.
Thank you in advance!

Read excel file and assign each coulmn a variable in MATLAB

I am having a simple problem while reading excel data which contains strings, long string, and numbers. Now I need to make each column (I have 11 here) to define separate variables of 1 column vector so that I can plot in MATLAB against each other or combination.
But the problem is the reading the file and creating 11 column vector. When I assign variable the header also comes.
Code:
%fid = fopen('Data_Link.xlsx');
[num,txt,raw] = xlsread('Data_Link.xlsx');
%fclose(fid);
% Extract data from readData
A = raw(:,1);
B = raw(:,2);
C = raw(:,6);
So I need the variables without header
Data file is truncated and given here.
Can anyone help me?
You can use readtable as ThP suggested. But if you want to use xlsread and you want your data without the header, you just need to remove the first row as in the below example:
%fid = fopen('Data_Link.xlsx');
[num,txt,raw] = xlsread('Data_Link.xlsx');
%fclose(fid);
% Extract data from readData
A = raw(2:end,1);
B = raw(2:end,2);
C = raw(2:end,6);
Note that each array will receive data from row 2 to last row.
You can use readtable instead of xlsread.
Using
T = readtable(‘Data_Link.xlsx’)
will result in a table with a variable for each column. For example T.Year would hold the values from the ‘Year’ column and T.Title would hold the values from the ‘Title’ column, etc.

Openpyxl to check for keywords, then modify next to cells to contain those keywords and total found

I'm using python 3.x and openpyxl to parse an excel .xlsx file.
For each row, I check a column (C) to see if any of those keywords match.
If so, I add them to a separate list variable and also determine how many keywords were matched.
I then want to add the actual keywords into the next cell, and the total of keywords into the cell after. This is where I am having trouble, actually writing the results.
contents of the keywords.txt and results.xlsx file
here
import openpyxl
# Here I read a keywords.txt file and input them into a keywords variable
# I throwaway the first line to prevent a mismatch due to the unicode BOM
with open("keywords.txt") as f:
f.readline()
keywords = [line.rstrip("\n") for line in f]
# Load the workbook
wb = openpyxl.load_workbook("results.xlsx")
ws = wb.get_sheet_by_name("Sheet")
# Iterate through every row, only looking in column C for the keyword match.
for row in ws.iter_rows("C{}:E{}".format(ws.min_row, ws.max_row)):
# if there's a match, add to the keywords_found list
keywords_found = [key for key in keywords if key in row[0].value]
# if any keywords found, enter the keywords in column D
# and how many keywords into column E
if len(keywords_found):
row[1].value = keywords_found
row[2].value = len(keywords_found)
Now, I understand where I'm going wrong, in that ws.iter_rows(..) returns a tuple, which can't be modified. I figure I could two for loops, one for each row, and another for the columns in each row, but this test is a small example of a real-world scenario, where the amount of rows are in the tens of thousands.
I'm not quite sure which is the best way to go about this. Thankyou in advance for any help that you can provide.
Use the ws['C'] and then the offset() method of the relevant cell.
Thanks Charlie for the offset() tip. I modified the code slightly and now it works a treat.
for row in ws.iter_rows("C{}:C{}"...)
for cell in row:
....
if len(keywords_found):
cell.offset(0,1).value = str(keywords_found)
cell.offset(0,2).value = str(len(keywords_found))

script task in SSIS to import excel spreadsheet

I have reviewed the questions that may have had my answer and unfortunately they don't seem to apply. Here is my situation. I have to import worksheets from my client. In columns A, C, D, and AA the client has the information I need. The balance of the columns have what to me is worthless information. The column headers are consistent in the four columns I need, but are very inconsistent in the columns that don't matter. For example cell A1 contains Division. This is true across all of the spreadsheets. Cell B1 can contain anything from sleeve length to overall length to fit. What I need to do is to import only the columns I need and map them to an SQL 2008 R2 table. I have defined the table in a stored procedure which is currently calling an SSIS function.
The problem is that when I try to import a spreadsheet that has different column names the SSIS fails and I have to go back in an run it manually to get the fields set up right.
I cannot imagine that what I am trying to do has not been done before. Just so the magnitude is not lost, I have 170 users who have over 120 different spreadsheet templates.
I am desperate for a workable solution. I can do everything after getting the file into my table in SQL. I have even written the code to move the files back to the FTP server.
I put together a post describing how I've used a Script task to parse Excel. It's allowe me to import decidedly non-tabular data into a data flow.
The core concept is that you will use a the JET or ACE provider and simply query the data out of an Excel Worksheet/named range. Once you have that, you have a dataset you can walk through row-by-row and perform whatever logic you need. In your case, you can skip row 1 for the header and then only import columns A, C, D and AA.
That logic would go in the ExcelParser class. So, the Foreach loop on line 71 would probably be distilled down to something like (code approximate)
// This gets the value of column A
current = dr[0].ToString();
// this assigns the value of current into our output row at column 0
newRow[0] = current;
// This gets the value of column C
current = dr[2].ToString();
// this assigns the value of current into our output row at column 1
newRow[1] = current;
// This gets the value of column D
current = dr[3].ToString();
// this assigns the value of current into our output row at column 2
newRow[2] = current;
// This gets the value of column AA
current = dr[26].ToString();
// this assigns the value of current into our output row at column 3
newRow[3] = current;
You obviously might need to do type conversions and such here but that's core of the parsing logic.

Resources