I started working with XLRD package for python 3.7.
I have a excel file that contains a fixed number of columns (20) but inside each column, the number of rows is changing (e.g.: first column has 21 rows, second column has 14 rows).
I wrote this:
for col in range(worksheet.ncols):
rows_number= worksheet.nrows
print(rows_number)
I'd like to know the number of rows for each column. With this code, I get 20 times (number of columns) the number of rows inside the first column. Actually I understand why. I'm iterating the nrows without changing the column.
How to get number of rows for all the columns?
If I try as follow, I get AttributeError since col doesn't have nrows attribute.
for col in range(worksheet.ncols):
rows_number= col.nrows
print(rows_number)
Thank you for your help!!
You can use list comprehension to get the non empty cells in a column, using col method in xlrd and comparing the Cell Type and then calculating its length
for colx in range(worksheet.ncols):
non_emptycells=[i for i,x in enumerate(sheet.col(colx)) if x.ctype is not 0]
print(len(non_emptycells))
You could use a nested while loop to work out the number of rows in each column
for c in range(worksheet.ncols):
counter = 0 # count of the rows
value = worksheet.cell(row=counter, col=c).value
while value != EMPTY:
counter += 1
row += 1
Replace the word EMPTY with a test to see if the cell is empty or not. You could then store this info in a dictionary for example once the while loop is done.
Related
I have a spreadsheet (Gantt chart) with dates in a column. Refer to the table below. The "Row" column is the row number in Excel, not a real column.
Row
Depends on Row(s) (Col F)
Start (Col G)
End (Col H)
Notes
9
7/24/21
7/26/21
10
9
7/27/21
7/30/21
Starts 1 day after row 9 ends.
11
7/25/21
7/27/21
12
9,11
7/28/21
7/29/21
Starts 1 day after MAX(row 9 end, row 11 end).
How do I automatically set cells "Start10" and "Start12" to read from cell "DependsOnRows" in their row to get the max of any numbers in the "DependsOnRows" column using a formula or other method?
Currently, I'm using this formula in cells "Start10" and "Start12", which includes a manually typed "max" function:
Start10:
=WORKDAY(MAX(H9), 1, Holidays!A$2:A$99)
Start12:
=WORKDAY(MAX(H9,H11), 1, Holidays!A$2:A$99)
I want to automate the reading of the row numbers inside the max function so they are read from the "DependsOnRows" column.
I can use any format in the "DependsOnRows" column. I can use braces, brackets, commas, spaces, whatever. The list just ideally needs to be in 1 cell, not multiple.
You can use the FILTERXML function to change the string of row numbers into a dynamic array. From there, you can INDEX the position of each row along with a fixed column number (in this case 8, or column H) to get test values for the MAX function.
Something like the below works for me:
=WORKDAY(MAX(
INDEX($A$1:$I$13,
FILTERXML("<t><s>" &SUBSTITUTE($F13, ",", "</s><s>")&"</s></t>", "//s"),8)),
1, Holidays!A$2:A$99)
I have a table of dates and a variable date range and need to find all matches for rows and columns where a date in the table lie within the range of my start/end date range.
As a (downscaled) example of my case:
Start date: 01Jan2018
End date: 30Jun2018
Table with dates:
{01Jan2018; 01Feb2018; 01Apr2018}
{17Mar2018; 05Jun2018; 16Aug2018}
{11Apr2018; 01Jul2018; }
Some fields in the table may be blank if no date has been entered yet. I believe I can make a comparison array by running the start/end dates against the date array, e.g. with
=--(array>=start_date)*(array<=end_date)
which would output
{1;1;1}
{1;1;0}
{1;0;0}
But what's the next step to get from here to a vertical list of row-column sets where row and column number is in separate cells? From the example above I would need a list like:
1 1
2 1
3 1
1 2
2 2
1 3
I have other arrays sized like the date array that I need to match the found coordinates against to look up other data using the found coordinates.
Try:
=IF(AND(A4>=$B$1,A4<=$B$2),"In","Out")
Results:
A bit of a long formula, but basically this one formula will:
Get the relevant date using index (based on the row() ) and the matrix (3x3)
Conversion to 1 & 0's based on whether it's between the dates
Return the Column / Row number if 1
For Column:
=IF(--(INDEX($A$4:$C$6,ROUNDUP((ROW()-1)/3,0),IF(MOD((ROW()-1),3)=0,3,MOD((ROW()-1),3)))>=$B$1)*(INDEX($A$4:$C$6,ROUNDUP((ROW()-1)/3,0),IF(MOD((ROW()-1),3)=0,3,MOD((ROW()-1),3)))<=$B$2)<>0,ROUNDUP((ROW()-1)/3,0),"n/a")
For Row:
=IF(--(INDEX($A$4:$C$6,ROUNDUP((ROW()-1)/3,0),IF(MOD((ROW()-1),3)=0,3,MOD((ROW()-1),3)))>=$B$1)*(INDEX($A$4:$C$6,ROUNDUP((ROW()-1)/3,0),IF(MOD((ROW()-1),3)=0,3,MOD((ROW()-1),3)))<=$B$2)<>0,IF(MOD((ROW()-1),3)=0,3,MOD((ROW()-1),3)),"n/a")
Image of the excel solution
Based on the size of you array, you need to create list of all coordinates (formulas to generate it at the end of the post). Then bring the value from this coordinate, check against your condition (column "C") and filter where TRUE (column "D").
And with formulas shown:
Result (highlighted with green):
Here are two formulas generating the list of columns and rows based on data range size:
A9: =IF(B9="end";"end";MOD(ROW(C1)-1+COLUMNS($A$1:$C$3);COLUMNS($A$1:$C$3))+1)
B9: =IF(1+INT((ROW(C1)-1)/COLUMNS($A$1:$C$3))>COLUMNS($A$1:$C$3);"end";1+INT((ROW(C1)-1)/COLUMNS($A$1:$C$3)))
I have a table like this, where I want to insert formulas in column B to arrive at the indicated values.
The logic is this - I want to count every alternate cell in that particular row, starting from column C till column AA, and get the number of cells that contain a date value greater than or equal to Target date.
Cols/Rows A B C D E F G
1 Target Date X X Date Y Y Date Z Z Date
2 13-12-2015 2 13-12-2015 13-01-2016
3 24-11-2015 1 25-11-2015 20-10-2015
4 23-01-2016 0
5 30-01-2016 0 06-06-2016 14-04-2015
To begin with, before I put the condition on the date, I first tried to get the number of alternate columns in this range by using the array formula =IF(MOD(COLUMN($C4:$AA4),2)=0,COLUMNS($C4:$AA4))
But this returns FALSE for some reason. Only if this returns a numeric value, I can proceed with adding a condition for dates.
How do I modify the formula? Any help is appreciated!
You want to use SUMPRODUCT():
=SUMPRODUCT((MOD(COLUMN($C4:$AA4),2)=0)*($C4:$AA4>=DATEVALUE"13/1/2015")*($C4:$AA4<=DATEVALUE"13/1/2016"))
This will return a count of every other column that has a date between 13/1/2015 and 13/1/2016
I don't have access to Excel at the moment to check, but I suspect the issue is that you're calling COLUMN with arguments. I think if you remove the range, and do =IF(MOD(COLUMN(), 2)=0, COLUMNS($C4:$AA4) the if will at least get true on even columns. COLUMN returns the numeric index of the current column, so you might need to fiddle with things to get this working.
The brute-force approach, write as many of these as you need,
=1*(C5>A5) + 1*(E5>A5) + 1*(G5>A5) + 1*(I5>A5)
and so on.
I am facing an issue while comparing the two column with very data contains approximate 5 to 6 lacks of cells. And I used countif formula to check the existence of value in Column A with Column B. However, it is taking huge time to calculate and I stopped the using excel for that task. And I am finding n alternative way of doing this in Pandas.
Is it is possible to find the list of unique values in Column A by comparing Column B. Please suggest.
Column A: 585256
Column B: 556245
Hey it is quiet easy by using the default python data structure that is sets.
Below is the simple snippet which returns the set difference.
def get_difference(file_1, file_2):
data_1 = set(open(file_1, encoding='utf-8').read().splitlines())
data_2 = set(open(file_2, encoding='utf-8').read().splitlines())
return data_1 - data_2
I have checked the performance with the data around 500000 lines. And script produced result with in 2 secs.
I have an excel sheet with few thousands rows of data. After applying a filter, there will be some rows filtered in between. My current code only counts the first few contiguous rows, the count stops even though there are more rows after it. How do I fix this?
Eg of row number after applying filter:
1
2
3
7
8
...
The count will only return 3. I am using the code below to do a row count.
print "Rows " & objsheet.Usedrange.SpecialCells(xlCellTypeVisible).Rows.Count
The answer was given by Tim Williams in the comment.
objsheet.usedrange.columns(1).specialcells(xlCellTypeVisible).count