Using xlrd to iterate through worksheets and workbooks - excel

I am a total noob. I need to grab the same cell value from every other sheet (starting at the third) in a workbook and place them into another. I continue to get an IndexError: list index out of range. There are 20 sheets in the workbook. I have imported xlrd and xlwt.
Code:
sheet_id = 3
output = 0
cellval = enso.sheet_by_index(sheet_id).cell(20,2).value
sheet_cp = book_cp.get_sheet(output)
sheet_cp.write(1, 1, cellval)
book_cp.save(path)
for sheet_id in range(0,20):
sheet_enso = enso_cp.get_sheet(sheet)
sheet_cp = book_cp.get_sheet(output)
sheet_cp.write(1, 1, cellval)
sheet_id = sheet_id + 2
output = output + 1

Your problem most probably exists in here:
sheet_id = 3
cellval = enso.sheet_by_index(sheet_id).cell(20,2).value # row:20, column:0
Check the following:
1- Make sure that sheet_id=3 is what you want (where the index of sheets starts from 0), so the 3rd sheet has index=2 unless you want the 4th sheet.
2- Check cell(20,0) exists in the selected sheet (where cell(0,0) is the first cell).
Plus, you don't need to define sheet_id
instead change the range to (2: 3rd sheet, 21: for 20 sheets) > in range(2,21) where:
range([start], stop[, step])
start: Starting number of the sequence.
stop: Generate numbers up to, but not including this number.
step: Difference between each number in the sequence.
Reference: Python's range() Parameters
and to get cellval from every sheet, put cellval inside the loop.
The final code could be:
output = 0
for sheet_id in range(2,21): # (starting at the 3rd sheet (index=2), stopping at 20 "21 not included")
cellval = enso.sheet_by_index(sheet_id).cell(20,0).value # row 20, column 0
#sheet_enso = enso_cp.get_sheet(sheet) # i don't know if you need that for something else
sheet_cp = book_cp.get_sheet(output)
sheet_cp.write(1, 1, cellval)
output = output + 1
book_cp.save(path)
again check cell(20,0) exists in all source sheets to avoid errors.

Related

Dynamic Range division with two variables

I'm trying to complete division for a dynamic range with two variables and it keeps dividing the dynamic range by the last number in the range. Below is my VBA code.
For i = 2 To 8
For r = 13 To 19
If ThisWorkbook.Sheets("Sheet1").Cells(i, 28) = "" Then
ThisWorkbook.Sheets("Sheet1").Cells(r, 28) = ""
Else
ThisWorkbook.Sheets("Sheet1").Cells(r, 28) = ThisWorkbook.Sheets("Sheet1").Cells(i, 28) / Range("$AB$8")
End If
Next r
Next i
Essentially it is dividing the last i value (Cell row 8) by the Range("$AB$8") (cells row 19).
What I would like to happen is the values in rows i to divide by Range("$AB$8")....in other words the value in cell (2,28)/ab8, (3/28)/ab8, (4,28)/ab8 etc etc.
It current is taking the value in cell (8,28) dividing it by ab8...and applying it to all defined r rows.
There are a number of issues here - all of which are small tweaks but end up with the wrong result you are seeing.
Your example code is not a dynamic range. You have hardcoded Cells(AB2:AB8) and Cells(AB13:AB19). You just did it in a way that is not obvious.
Also not very obvious is that you are writing the results to a single column. See the pattern here:
Loop 1: i = 2, results may be writing to Cells(AB13:AB19)
[…]
Loop 7: i = 8, results may be writing to Cells(AB13:AB19)
I said "may" because you have the If statement.
Depending on what you really want to happen, the code can be amended.
Instead off the first loop put a conditional there (e.g. If all cells
in that range are blank then …, or if any cells are blank then ...)
Use an Exit For after fixing the first blank loop
Also address the column (i.e. results spread across multiple columns)
Use a single loop (For i = 2 to 8 … and then adjust r according to i… r = i+12)

How to randomly select column that does not contain a specific value in Excel or SPSS

Would anyone know the Excel formula, VBA, or SPSS syntax to do the following:
Create a new variable/column in a dataset or spreadsheet which is populated by the column number (or column title) of a randomly selected column (from a range of 1-42 columns), provided the value in that column for a given row does not contain 99.
In Excel I can do the first step and create random numbers and match these to columns, but I don't know how (or if possible) to 're-roll' a new random number if the initial matched column contains the value 99.
My formula for generating a random number between 1 and 42 to identify a column:
AQ=RANDBETWEEN(1,3)
For a row in Excel using 9-row dummy data: =HLOOKUP(AQ,$A$1:$AP$9,2,FALSE)
Here's an example of how you can re-roll... for the given row, I chose 10 but you can change this however you need
EDIT - now looping thru givenRow:
Sub test()
Dim randCol As Integer
Dim givenRow As Long
Dim saveCol As Integer: saveCol = 44 ' where to store results
With ThisWorkbook.Worksheets("your sheet name")
For givenRow = 1 To 100
Do While True
' get column between 1 and 42
randCol = Int(42 * Rnd + 1)
' if not 99 exit
If .Cells(givenRow, randCol).Value <> 99 Then Exit Do
Loop
' store results in saveCol for givenRow
.Cells(givenRow, saveCol).Value = randCol
Next
End With
End Sub
Heres how you could go about it in SPSS using Python:
begin program.
import spss, spssaux
import random
# get variable list
vars = spssaux.VariableDict().expand(spss.GetVariableName(0) + " to " + spss.GetVariableName(spss.GetVariableCount()-1))
proceed = True
breakcount = 0
while proceed:
# generate random integer between 0 and variable count -1, get random variable's
# name and index-position in dataset
rng = random. randint(0,spss.GetVariableCount() - 1)
ranvar = spss.GetVariableName(rng)
ind = int(vars.index(ranvar))
# read data from random variable, if value 99 is stored in the variable, go back to the top. if not, compute variable
# random_column = column number (index +1 NOT index)
randat = spss.Cursor([ind])
d = randat.fetchall()
randat.close()
data = [str(x).strip('(),') for x in d]
breakcount += 1
if "99.0" not in data:
spss.Submit("compute random_column = %s." %(ind + 1))
proceed = False
elif breakcount == 42:
break
end program.
it iterates through random variables until it finds one without the value 99 in it, then computes the new variable containing the comlumn number.
Edit: Added a break condition so that it doesnt loop infinitely just in case every variable contains a 99

Fill Excel file with for loop

I'm trying to fill an Excel file using a for loop, the logic for filling is to do it every 189 results, for Example:
Fill from A1 to A189
Fill from B1 to B189
At the moment I have a code which works fine with first row A, but the problem is when I try it with B, C, D, etc.
This is my code:
'' I don't need first 4 results.
'' Total rows in this case is 569
For index as Integer = 5 To totalRows Step 1
Dim column as Integer = 2
'' var used to know if row is completed and change the ExcelProcess method
If rowsCompleted = 1 Then
'' realRows = 569 / 3 rounded down = 189
'' Flag initial value = 5 and is used as a internal index value instead of index var at for loop.
If flag <= realRows Then
'' copy
'' Excel Range = spreadsheet1.Cells(index, 7)
'' paste
'' Excel Range = spreadsheet2.Cells(8 * rowsCompleted - 6, index)
flag = flag + 1
Else
'' copy
'' Excel Range = spreadsheet1.Cells(index + 2, 7)
flag = flag + 1
'' paste
'' Excel Range = spreadshee2.Cells(8 * rowsCompleted - 6, flag)
End If
Else
rowsCompleted = rowsCompleted + 1
flag = 5
End If
Next
Debugging step by step I founded some important details.
First row contains 190 rows but the first one is not needed so I only need 189 rows and start on 5. second and third row contains 189 rows so there is no problem. Also I need to start on 5 row.
Also I found that with my code the first row ends fine on 190. second row must ends on 379 but I found that ends on 381 or 382. So I think that maybe the problem is with my for loop and index or flag vars.
Also I think that maybe the problem is with copy the value with this code:
Excel Range = spreadsheet1.Cells(index + 2, 7) because I'm adding + 2.
Why do you need to loop?
Range("A2:D189").Copy
spreadsheet2.range("A2").PasteSpecial xlpastevalues
It's not really very clear what you are trying to achieve at the start you say Fill from A1 to A189 Fill from B1 to B189 but then you say for columns A to D which is fine.
The second block of text says
First row contains 190 rows but the first one is not needed so I only need 189 rows and start on 5. second and third row contains 189 rows so there is no problem. Also I need to start on 5 row.
Also I found that with my code the first row ends fine on 190. second row must ends on 379 but I found that ends on 381 or 382. So I think that maybe the problem is with my for loop and index or flag vars.
I am having a hard time digesting what you mean when you say First row contains 190 rows but first is not needed (I assume you want row 2 to 190?) but then you say that you need it to start on 5 row so I am not sure if you want it to be from row to or 5??
Then you say second row must end on 379 so that doesn't make much sense other than it's 190 doubled.
Can you give a clearer outline if what you want to achieve? what do range do you want to populate and where from?

Openpyxl Keeping formulas but copying values

My workbook has 6 sheets. The last 3 sheets are a bunch of formulas that I currently manually copy and paste values into the first 3 sheets. I am using wb.copy_worksheet() to make the copies and loading the work book as data_only = True. However, when I save, the formulas are all gone due to loading it as data_only. Is there a way I can copy the values but keep the formulas? The sheets are too large to go cell by cell.
Heres my code:
import openpyxl
wb = openpyxl.load_workbook("symbols.xlsx", data_only=True)
ws = wb.get_sheet_names()
print (ws)
Value = ws[0:3] #set equal to first 3 sheets
BB = ws[3:7] #set equal to last 3 sheets
for s in range(0, len(Value)):
CopyBB = wb.copy_worksheet(wb[BB[s]]) #copy from bb sheet
CopyBB.title = Value[s]
myorder = [6, 7, 8, 3, 4, 5, 0, 1, 2] #this is to reorder the sheets that got copied.
wb._sheets =[wb._sheets[i] for i in myorder]
wb.remove_sheet(wb.worksheets[8])
wb.remove_sheet(wb.worksheets[7])
wb.remove_sheet(wb.worksheets[6])
wb.worksheets[0].title = "Securities Values"
wb.worksheets[1].title = "Indices Values"
wb.worksheets[2].title = "Currencies Values"
return wb.save("symbols.xlsx")
I had a similar issue. You have to use ='MVSM!A1', where MVSM is the sheetname then A1 is the cell value. Just use that when assigning cell value whilst using openpyxl.

How can I lookup data from one column, when the value I'm referencing changes columns?

I want to do an INDEX-MATCH-like lookup between two documents, except my MATCH's index array doesn't stay in one column.
In Vague-English: I want a value from a known column that matches another value that may be found in any column.
Refer to the image below. Let's call everything to the left of the bold vertical line on column H doc1, and the right side will be doc2.
Doc2 has a column "Find This", which will be the INDEX's array. It is compared with "ID1" from doc1 (Note that the values in "Find This" will not be in the same order as column ID1, but it's easier to undertsand this way).
The "[Result]" column in doc2 will be the value from doc1's "Want This" column from the row that matches "FIND THIS" ...However, sometimes the value from "FIND THIS" is not in the "ID1" column, and is instead in "ID2","ID3", etc.
So, I'm trying to generate Col K from Col J. This would be like pressing Ctrl+F and searching for a value in Col J, then taking the value from Col D in that row and copying it to Col K.
I made identical values from a column the same color in the other doc to make it easier to visualize where they are coming from.
Note also that in column F of doc1, the same value from doc2's "Find This" can be found after some other text.
Also note that the column headers are only there as examples, the ID columns are not actually numbered.
I would simply hard-code the correct column to search from, but I'm not in control of doc1, and I'm worried that future versions may have new "ID" columns, with other's being removed.
I'd prefer this to be a solution in the form of a formula, but VB will do.
To generate column K based on given values of column J then you could use the following:
=INDEX(doc1!$D$2:$D$14,SUMPRODUCT((doc1!$B$2:$H$14=J2)*ROW(doc1!$B$2:$H$14))-1)
Copy that formula down as far as you need to go.
It basically only returns the row of the where a matching column J is found. we then find that row in the index of your D range to get your value in K.
Proof of concept:
UPDATE:
If you are working with non unique entities n column J. That is the value on its own can be found in multiple rows and columns. Consider using the following to return the Last row where there J value is found:
=INDEX(doc1!$D$2:$D$14,AGGREGATE(14,6,(doc1!$B$2:$H$14=J2)*ROW(doc1!$B$2:$H$14),1)-1)
UPDATE 2:
And to return the first row where what you are looking in column J is found use:
=INDEX($D$2:$D$14,AGGREGATE(15,6,1/($B$2:$H$14=J2)*ROW($B$2:$H$14)-1,1))
Thanks to Scott Craner for the hint on the minimum formula.
To determine if you have UNIQUE data from column J in your range B2:H14 you can enter this array formula. In order to enter an array formula you need to press CTRL+SHFT+ENTER at the same time and not just ENTER. You will know you have done it right when you see {} around your formula in the formula bar. You cannot at the {} manually.
=IF(MAX(COUNTIF($B$2:$H$14,J2:J14))>1,"DUPLICATES","UNIQUE")
UPDATE 3
AGGREGATE - A relatively new function to me but goes back to Excel 2010. Aggregate is 19 functions rolled into 1. It would be nice if they all worked the same way but they do not. I think it is functions numbered 14 and up that will perform the same way an array formula or a CSE formula if you prefer. The nice thing is you do not need to use CSE when entering or editing them. SUMPRODUCT is another example of a regular formula that performs array formula calculations.
The meat of this explanation I believe is what is happening inside of the AGGREGATE brackets. If you click on the link you will get an idea of what the first two arguments are. The first defines which function you are using, and the second tell AGGREGATE how to deal with Errors, hidden rows, and some other nested functions. That is the relatively easy part. What I believe you want to know is what is happening with this:
(doc1!$B$2:$H$14=J2)*ROW(doc1!$B$2:$H$14)
For illustrative purpose lets reduce this formula to something a little smaller in scale that does the same thing. I'll avoid starting in A1 as that can make life a little easier when counting since it the 1st row and first column. So by placing the example range outside of it you can see some more special considerations potentially.
What I want to know is what row each of the items list in Column C occurs in column B
| B | C
3 | DOG | PLATYPUS
4 | CAT | DOG
5 | PLATYPUS |
The full formula for our mini example would be:
{=($B$3:$B$5=C2)*ROW($B$3:$B$5)}
And we are going to look at the following as an array
=INDEX($B$3:$B$5,AGGREGATE(14,6,($B$3:$B$5=C2)*ROW($B$3:$B$5),1)-2)
So the first brackets is going to be a Boolean array as you noted. Every cell that is TRUE will TRUE until its forced into a math calculation. When that happens, True becomes 1 and False becomes 0.I that formula was entered as a CSE formula and place in D2, it would break down as follows:
FALSE X 3
FALSE X 4
TRUE X 5
The 3, 4 and 5 come from ROW() returning the value of the row number that it is dealing with at the time of the array math operation. Little trick, we could have had ROW(1:3). Just need to make sure the size of the array matches! This is not matrix math is just straight across multiplication. And since the Boolean is now experiencing a math operation we are now looking at:
0 X 3 = 0
0 X 4 = 0
1 X 5 = 5
So the array of {0,0,5} gets handed back to the aggregate for more processing. The important thing to note here is that it contains ONLY 0 and the individual row numbers where we had a match. So with the first aggregate formula, formula 14 was chosen which is the LARGE function. And we also told it to ignore errors, which in this particular case does not matter. So after providing the array to the aggregate function, there was a ,1) to finish off the aggregate function. The 1 tells the aggregate function that we want the 1st larges number when the array is sorted from smallest to largest. If that number was 2 it would be the 2nd largest number and so on. So the last row or the only row that something is found on is returned. So in our small example it would be 5.
But wait that 5 was buried inside another function called Index. and in our small example that INDEX formula would be:
=INDEX($B$3:$B$5,AGGREGATE(...)-2)
Well we know that the range is only 3 rows long, so asking for the 5th row, would have excel smacking you up side the head with an error because your index number is out of range. So in comes the header row correction of -1 in the original formula or -2 for the small example and what we really see for the small example is:
=INDEX($B$3:$B$5,5-2)
=INDEX($B$3:$B$5,3)
and here is a weird bit of info, That last one does not become PLATYPUS...it becomes the cell reference to =B5 which pulls PLATYPUS. But that little nuance is a story for another time.
Now in the comments Scott essentially told me to invert for the error to get the first row. And this is important step for the aggregate and it had me running in circles for awhile. So the full equation for the first row option in our mini example is
=INDEX($B$3:$B$5,AGGREGATE(15,6,1/($B$3:$B$5=C2)*ROW($B$3:$B$5),1)-2)
And what Scott Craner was actually suggesting which Skips one math step is:
=INDEX($B$3:$B$5,AGGREGATE(15,6,ROW($B$3:$B$5)/($B$3:$B$5=C2),1)-2)
However since I only realized this after writing this all up the explanation will continue with the first of these two equations
So the important thing to note here is the change from function 14 to function 15 which is SMALL. Think of it a finding the minimum. And this time that 6 plays a huge factor along with the 1/. So our array in the middle this time equates to:
1/FALSE X 3
1/FALSE X 4
1/TRUE X 5
Which then becomes:
1/0 X 3
1/0 X 4
1/1 X 5
Which then has excel slapping you up side the head again because you are trying to divide by 0:
#div/0 X 3
#div/0 X 4
1/1 X 5
But you were smart and you protected yourself from that slap upside the head when you told AGGREGATE to ignore error when you used 6 as the second argument/reference! Therefore what is above becomes:
{5}
Since we are performing a SMALL, and we passed ,1) as the closing part of the AGGREGATE, we have essentially said give me the minimum row number or the 1st smallest number of the resulting array when sorted in ascending order.
The rest plays out the same as it did for the LARGE AGGREGATE method. The pitfall I fell into originally is I did not use the 1/ to force an error. As a result, every time I tried getting the SMALL of the array I was getting 0 from all the false results.
SUMPRODUCT works in a very similar fashion, but only works when your result array in the middle only returns 1 non zero answer. The reason being is the last step of the SUMPRODUCT function is to all the individual elements of the resulting array. So if you only have 1 non zero, you get that non zero number. If you had two rows that matched for instance 12 and 31, then the SUMPRODUCT method would return 43 which is not any of the row numbers you wanted, where as aggregate large would have told you 31 and aggregate small would have told you 12.
Something like this maybe, starting in K2 and copied down:
=IFERROR(INDEX(D:D,MAX(IFERROR(MATCH(J2,B:B,0),-1),IFERROR(MATCH(J2,E:E,0),-1),IFERROR(MATCH(J2,G:G,0),-1),IFERROR(MATCH(J2,H:H,0),-1))),"")
If you want to keep the positions of the columns for the Match variable, consider creating generic range names for each column you want to check, like "Col1", "Col2", "Col3". Create a few more range names than you think you will need and reference them to =$B:$B, =$E:$E etc. Plug all range names into Match functions inside the Max() statement as above.
When columns are added or removed from the table, adjust the range name definitions to the columns you want to check.
For example, if you set up the formula with five Matches inside the Max(), and the table changes so you only want to check three columns, point three of the range names to the same column. The Max() will only return one result and one lookup, even if the same column is matched several times.
I came up with a vba solution if I understood correctly:
Sub DisplayActiveRange()
Dim sheetToSearch As Worksheet
Set sheetToSearch = Sheet2
Dim sheetToOutput As Worksheet
Set sheetToOutput = Sheet1
Dim search As Range
Dim output As Range
Dim searchCol As String
searchCol = "J"
Dim outputCol As String
outputCol = "K"
Dim valueCol As String
valueCol = "D"
Dim r As Range
Dim currentRow As Integer
currentRow = 1
Dim maxRow As Integer
maxRow = sheetToOutput.UsedRange.Rows.Count
For currentRow = 1 To maxRow
Set search = Range("J" & currentRow)
For Each r In sheetToSearch.UsedRange
If r.Value <> "" Then
If r.Value = search.Value Then
Set output = sheetToOutput.Range(outputCol & currentRow)
output.Value = sheetToSearch.Range(valueCol & currentRow).Value
currentRow = currentRow + 1
Set search = sheetToOutput.Range(searchCol & currentRow)
End If
End If
Next
Next currentRow
End Sub
There might be better ways of doing it, but this will give you what you want. We assume headers in both "source" and "destination" sheets. You will need to adapt the "Const" declarations according to how your sheets are named. Press Control & G in Excel to bring up the VBA window and copy and paste this code into "This Workbook" under the "VBA Project" group, then select "Run" from the menu:
Option Explicit
Private Const sourceSheet = "Source"
Private Const destSheet = "Destination"
Public Sub FindColumns()
Dim rowCount As Long
Dim foundValue As String
Sheets(destSheet).Select
rowCount = 1 'Assume a header row
Do While Range("J" & rowCount + 1).value <> ""
rowCount = rowCount + 1
foundValue = FncFindText(Range("J" & rowCount).value)
Sheets(destSheet).Select
Range("K" & rowCount).value = foundValue
Loop
End Sub
Private Function FncFindText(value As String) As String
Dim rowLoop As Long
Dim colLoop As Integer
Dim found As Boolean
Dim pos As Long
Sheets(sourceSheet).Select
rowLoop = 1
colLoop = 0
Do While Range(alphaCon(colLoop + 1) & rowLoop + 1).value <> "" And found = False
rowLoop = rowLoop + 1
Do While Range(alphaCon(colLoop + 1) & rowLoop).value <> "" And found = False
colLoop = colLoop + 1
pos = InStr(Range(alphaCon(colLoop) & rowLoop).value, value)
If pos > 0 Then
FncFindText = Mid(Range(alphaCon(colLoop) & rowLoop).value, pos, Len(value))
found = True
End If
Loop
colLoop = 0
Loop
End Function
Private Function alphaCon(aNumber As Integer) As String
Dim letterArray As String
Dim iterations As Integer
letterArray = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
If aNumber <= 26 Then
alphaCon = (Mid$(letterArray, aNumber, 1))
Else
If aNumber Mod 26 = 0 Then
iterations = Int(aNumber / 26)
alphaCon = (Mid$(letterArray, iterations - 1, 1)) & (Mid$(letterArray, 26, 1))
Else
'we deliberately round down using 'Int' as anything with decimal places is not a full iteration.
iterations = Int(aNumber / 26)
alphaCon = (Mid$(letterArray, iterations, 1)) & (Mid$(letterArray, (aNumber - (26 * iterations)), 1))
End If
End If
End Function

Resources