I will try to explain my question as clear as i can.It would be so nice if you help me out.
(1)I am working in a tech support company where I am working on storage box like VNX and XtremeIo. I want to fetch the report of the data usage on each of the storage pool available in the storage box.
(2) Since the data keep on varying daily , so I want the daily report to be generated at a particular time everyday, and the report I should get it in my mail(outlook, gmail etc..) using python language.
Thank You!!
Related
I currently have approximately 10M rows, ~50 columns in a table that I wrap up and share as a pivot. However, this also means that it takes approximately 30mins-1hour to download the csv or much longer to do a powerquery ODBC connection directly to Redshift.
So far the best solution I've found is to use Python -- Redshift_connector to run update queries and perform an unload a zipped resultset to an S3 bucket then use BOTO3/gzip to download and unzip the file, then finally performing a refresh from the CSV. This resulted in a 600MB excel file compiled in ~15-20 mins.
However, this process still feel clunky and sharing a 600MB excel file among teams isn't the best either. I've searched for several days but I'm not closer to finding an alternative: What would you use if you had to share a drillable table/pivot among a team with a 10GB datastore?
As a last note: I thought about programming a couple of PHP scripts, but my office doesn't have the infrastructure to support that.
Any help would or ideas would be most appreciated!
Call a meeting with the team and let them know about the constraints, you will get some suggestions and you can give some suggestions
Suggestions from my side:
For the file part
reduce the data, for example if it is time dependent, increase the interval time, for example an hourly data can be reduced to daily data
if the data is related to some groups you can divide the file into different parts each file belonging to each group
or send them only the final reports and numbers they require, don't send them full data.
For a fully functional app:
you can buy a desktop PC (if budget is a constraint buy a used one or use any desktop laptop from old inventory) and create a PHP/Python web application that can do all the steps automatically
create a local database and link it with the application
create the charting, pivoting etc modules on that application, and remove the excel altogether from your process
you can even use some pre build applications for charting and pivoting part, Oracle APEX is one examples that can be used.
We are moving to an online D365 environment and we're trying to determine how much data we're using in the Dataverse tables. Under Capacity I can go into the environment details and see how much space per table is used, but it's a daily value. We're trying to remove some data as we're reaching some of our capacity limits -- but I can't find where it shows how much data is being used per table in real time. Thanks for any advise on how to pull this.
This article does mention how to get to the capacity limits and usage, but all values appear to be updated just daily:
https://learn.microsoft.com/en-us/power-platform/admin/capacity-storage?source=docs
I'm trying to find some way to see the data used in real time.
I have a power query in excel that gets Bitcoin price changes from https://coinmarketcap.com/currencies/bitcoin/. However, the update of the prices in excel is taking an average of 7 minutes. However, the price change at the web address above is at an average of fewer than 20 seconds. This makes my power query not serve its intended purpose. How can I speed up power query in excel?
Put yourself in the shoes of those who are willing to help you on a voluntary basis. What exactly did you do to make the query take so long?
Maybe you are trying to read the page via web scraping (aka New web table inference)? Then I have the solution for you that will definitely be faster:
Get the API key from the site and request the data in csv format.
You are already doing that? Hope, you got the point: StackOverflow is not a guessing game ...
I download the end of day stock prices for over 20,000 global securities across 20 different markets. I then run my 20,000 proprietary trading setups over these securities for profitable trading setups. The process is simple but the process needs the power of cloud computing to automate because its impossible to run on a desktop.
I'm coming at this solution as a complete beginner so please excuse my lack of technical understanding.
I download the prices from a single source onto my computer into Microsoft Excel Files.
Do I use Apache Arrow to transport the excel files into Apache Parquet? I'm considering Parquet because its a columnar storage solution which is ideal for historical stock price file formats.
To run my 20,000 proprietary trading setups I would use Apache Spark to read the parquet files in my chosen cloud environment.
This would produce the high probability trade results everyday which would upload onto my web based platform.
A very simplified setup from my current research. Thank you for assistance in advance.
Kind regards
Levi
I'm sorry but you don't have a big data setup.
What you are doing is using just one computer to convert from excel files into parquet. If you are able to read the data and write again on disk in a reasonble time it seems you don't have "big data".
What you should do is:
Get data into your datalake using something like Apache NiFi
Use spark to read data from datalake. For excel files see How to construct Dataframe from a Excel (xls,xlsx) file in Scala Spark?
I currently have an excel based data extraction method using power query and vba (for docs with passwords). Ideally this would be programmed to run once or twice a day.
My current solution involves setting up a spare laptop on the network that will run the extraction twice a day on its own. This works but I am keen to understand the other options. The task itself seems to be quite a struggle for our standard hardware. It is 6 network locations across 2 servers with around 30,000 rows and increasing.
Any suggestions would be greatly appreciated
Thanks
if you are going to work with increasing data, and you are going to dedicate a exclusive laptot for the process, i will think about install a database in the laptot (MySQL per example), you can use Access too... but Access file corruptions are a risk.
Download to this db all data you need for your report, based on incremental downloads (only new, modified and deleted info).
then run the Excel report extracting from this database in the same computer.
this should increase your solution performance.
probably your bigger problem can be that you query ALL data on each report generation.