Extracting data from series of excel files - excel

I have a series of Excel data with multiple cities and sensors in a single worksheet. What would be the fastest and most accurate way to extract data from a city X and a Y sensor?
In my case, I need only the data from the sensor "umidade_solo_nivel1".
I could do with pivot table, but it would take a lot of time, since there are 66 cities and a different Excel worksheet for each month of the year.
Following is an image of the worksheet that helps you understand how the data is organized.
Thank you in advance for any help you can provide.

One solution is to set up your file to include "indicators" and the =INDIRECT() function. Using this, you can set up a series of indicators (in separate rows) which include the sheet name, row, and column of data you are looking for. This can be somewhat time-intensive to set up at first, but can definitely pay off in the long run, especially if your data are periodically updated or if more data is entered. All it takes is a little creativity.
If you are not familiar with the function, you could check out this website:
http://www.contextures.com/xlFunctions05.html
As David Lee mentioned in your comments, VBA or other languages like Python or C# could also be of use. As we can not fully understand what you need specifically or what all of your data look like from your description, I would recommend learning the function I mentioned and evaluating if it would be helpful in this context before learning another language like VBA (though I highly recommend learning VBA if you anticipate spending lots of time in Excel).

Related

Filling in an array (table) with no Horizontal (row) or Vertical (column) duplicates formulaically in Excel

This may be impossible, but I am trying to create a baseball fielding lineup generator that has a few constraints provided by the league (ie. must play twice in infield, twice in outfield, no repeats at any position). I think this would be fairly simple task using any programming language, but I am designing this for my 70yo uncle and he can basically only use excel with no macros. So I can't brute force my way through the problem and I don't think I understand the mathematics behind the problem well enough to even know if there is a excel formulaic solution.
At it's essence it's a Sudoku creator and solver with no repeats in the rows or columns. I have an ok solution for the infield/outfield part via ranking the individuals by position.
Recursively calculating with F9 is ok for the solution since this is just a change in the options menu, but the last time I sent him a macro his university MS account wouldn't let him run it or change the settings.
Well, I have plenty of other info and have gotten close to solutions using huge nested IFs, but this relatively brute force method seems to be pretty dumb and is not giving great solutions.
Thanks for any help!
Well if your uncle has access to Excel 365 (sounds like a big if), you could use sorting to remove players already used in a given row or column, e.g. like this copied down and across from B2:
=LET(range,$L$2:$L$27,
choice,IF(COUNTIF(B$1:B1,range)+COUNTIF($A2:A2,range)=0,range),
count,COUNT(choice),
sort,SORT(choice),
INDEX(sort,RANDBETWEEN(1,count)))
If the players were given names instead of numbers, you could try
=LET(range,$L$2:$L$27,
choice,IF(COUNTIF(B$1:B1,range)+COUNTIF($A2:A2,range)=0,range),
count,SUM(--ISTEXT(choice)),
sort,SORT(choice),
INDEX(sort,RANDBETWEEN(1,count)))

Dynamically generate list of payment dates considering first/last date and client - ideal for controlling receivables of SaaS and recurrent contracts

I'm building an accounts receivable sheet in Google Sheets.
I would like to register the clients and their contract characteristics (client, payment frequency and price) in one sheet and I would like to dynamically generate the payment dates in another sheet.
The input sheet would look like this:
The output sheet would look like this:
I think it might be something in the QUERY and ARRAYFORMULA universe but I don't know how to configure it. Is there a way to dynamically generate the combination of Date and Client, taking into consideration first and last payment dates?
Sample in this link. If you'd like to use, please fill free to create a copy for yourself and post it in your answer.
Creating a 2D array of concatenated strings of dates and values can be a good first step in these kinds of problems.
I've demonstrated the idea in a tab called MK.Help on this sheet that I also shared in the comment above. This formula can be found in cell A2 and is generating the whole list:
=ARRAYFORMULA(QUERY(SPLIT(FLATTEN(Input!A2:A5&"|"&Input!D2:D5+SEQUENCE(1,CEILING(MAX(IFERROR((Input!E2:E5-Input!D2:D5)/Input!C2:C5))),0)*Input!C2:C5&"|"&Input!E2:E5),"|",0,0),"select Col2, Col1 where Col2<=Col3 order by Col2"))
Once you have the data in a big 2D array, you can flatten it out and then split it into it's component parts to make it query'able. I've tried to outline the process to the right of the solution.
#MattKing's answer was really good but I particularly had problems since all my inputs would have dynamic sizes and doing his step by step I couldn't figure how to adapt to this situation.
So, using a lot of Matt's inspiration and some extra research (including this new question) I came to a solution that worked better for me, using multiple pages to come to a final result. Not so classy but works.
I left my solution available in this sheet.
Even though, I've chosen to accept Matt's answer since it worked, it helped me, it looks "more pythonic" and maybe the need to be so dynamic wasn't so clear in the question.

How to extract list of unique items from a column in a large data set

I am trying to extract unique Article Numbers from a big database inside excel. It will go up to 15000-20000 unique article numbers. I have tried to use the code below to solve this, and it does work. But the document gets so slowed down by this and thus becomes a pain to work with. This will be used every day so at this pace it would be unbearable to work with. Do you know any good ways of speeding this up? Read smt about binary search, but I don't know how to implement that into the code I have down below. Any help is appreciated:)
=IFERROR(IF(LOOKUP(2;1/(COUNTIF($A$1:A1;Unique) =0);Unique)= 0; "";LOOKUP(2;1/(COUNTIF($A$1:A1;Unique) =0);Unique) );"")
//The unique is just a named range, so it doesn't have to handle the full 20000 rows at all times
Oh that would work. I can write some vba code to perform that action :) Ty very much

How to optimize COUNTIFS with very large data

I would like to create a report that look like this picture below.
My data has around 500,000 cells (it will continue to grow larger)
Right now, I'm using countifs function from excel but it takes a very long time to calculate. (cannot turnoff automatic calculate)
The main value is collected as date and the range of date is about 3 years, so I have to put a lot of formula to cover all range of value.
result
The picture below is the datasource the top one cannot be changed. , while the bottom is the one I created by myself (can change). I use weeknum to change date to week number.
data
Are there any better formula or any ways to make this file faster? Every kinds of suggestions are welcome!
I was thinking about using Pivot Table, but I don't know how to make pivot table from this kind of datasource.
PS. VBA is the last option.
You can download example file here: https://www.mediafire.com/?t21s8ngn9mlme2d
I will post this answer with the disclaimer that it is entirely dependent on the size of the data set. That turning on and off the auto calculate is the best way, but your question doesn't let me do that, so keep reading.
Your question made me curious, so I gave it a try and timed it. I essentially set up two columns of over 100,000 rand numbers choosing from 1-1000 and then tried to do a countif on the two columns if they were equal. I made a macro that I can run that turns off the autocalculate, inserts the start time, calculates, and then inserts the finish time. I highlighted in yellow the time difference.
First I tried your way, two criteria, countifs:
Then I tried to combine (concatenate) the two columns to see if I could make it easier by only having one countif criteria and data set. It doesn't. see result below:
Finally, realizing what was going on. I decided to make the criteria only match the FIRST value in the number to look for. I was essentially reducing the number of characters to check per cell. This had a positive result. See below:
Therefore my suggestion is to limit the length of the words you are comparing in anyway possible. You are mostly looking at dates, so you might have to get creative, but this seems to be the best way possible without going to manual calculation.
I have worked with Excel sheets of a similar size. Especially if you are using the data on a regular basis, I would heartily recommend switching to a proper database SQL based, Access, or whatever fits your purpose. I does wonders for the speed and also you won't run into the size limits of Excel. :-)
You can import the data you have now fairly easy.
I am happy as a clam with my postgresql db.

a question on using Microsoft Excel to do data analysis

Me and my mates have been given an Excel spreasheet,which contains the data of the Census report.
The column here corresponds to "Year" (from 2000-2010),the row means "City/Town",the cell is simply the population of a town at a particular year.
We are up to doing some analysis like “what town had its first population gain in 10 years?”
etc.My question is just about can we do this in Excel,or do we need to export the data to other database(SQL) then do the programming?
Thanks in advance.
Yes, you can do it in Excel. That answer your question?
For that specific piece of data analysis
{=AND(SUM(--((C2:K2-B2:J2)>0))=0,L2>K2)}
Don't enter the curly braces, enter the formula with Control+Shift+Enter.
But if you're more comfortable in SQL, then you should import it into a database and work with it that way. Put it where you have the best skill set and you'll get the job done the quickest.
You can do a substantial amount in excel, both by simply using the facilities provided as cell functions, and if that doesn't suffice, you can use VBA (Visual Basic for Applications) built into excel for more complex analysis.

Resources