Using Pandas to Write to a File within a Samba Share - python-3.x

I am using a GCP Cloud Function to read from a BigQuery Table and output the results to a CSV file located on a network drive (all the infrastructure parts necessary to communicate with on-prem are in place). I was wondering whether there is a way to write data out to this location using Pandas and PYSMB?
I have done a fair bit of reading on the topic and couldn't find a way, but thought someone with more experience may have an idea.
Thank you very much for your help.
Regards,
Scott

Related

Automated transfer of files to an SFTP - SPSS Modeler / PS Clem

We currently use SPSS Modeler for our analytics and output to excel files for reporting. We automate the running of modeler streams with PS Clementine.
We do have access to SQL server tables within Modeler via ODBC connections.
What I need to do is to automate sending some of the outputs created to an SFTP daily (Filezilla). The outputs currently sit in a Onedrive location.
Ideally I'd like to be able to do some checks on the file I.e how many rows of data held etc. If the checks pass or fail I'd like to then email a distribution list to either advise them to investigate or to advise the file has been transferred to the SFTP successfully.
I've done this using a combination of SAS Cloud / Hadoop/ SAS on prem / Globalscape before.
Is there a solution that suits SPSS modeler/ PS Clementine?
I've searched the forum regarding the following but haven't found a relevant solution for my set up so any help would be very much appreciated.
I don't think this is possible out-of-the-box.
However, I think the following workaround should be possible (assuming SPSS modeler has enough privileges in your environment): You could either use a Python for Spark export node or the SPSS Modeler Scripting API to check the data and use python for the sftp transfer.
We use python to extend the functionality of SPSS Modeler all the time with great success.

Best Way to Handle SFTP Files in Azure Data Factory

I'm very new to Azure in general, particularly Data Factory v2, i'm also very new at this company. We have an ask from a vendor to query data out to a file and then drop it to an Amazon S3 bucket, however Azure Data Factory does not appear to support this. The client wants to use an SFTP method, but i'm wondering which is the best option. Apparently we have a Linux server all set-up but i'm not sure if it's within a VM in the cloud. I'm learning about logic apps and functions but not sure which is best way to go. I'm familiar with WinSCP and have scripted a way to handle SFTP files in the past, but wondering if that's truly a good way to handle this.
As I said, i'm very new to this so wanting to get some ideas. Have any of you done this in the past and what would you recommend?
I've done A ton of reading about transferring files to Amazon S3 and SFTP server. My head is spinning. I realize my question is general, but my team is very new and we took over this environment from a consulting firm so we don't have a lot of background.
The outcome which i'm hoping for with this post is the best, least intensive method for sending files via SFTP in Azure Data Factory

Using python and Google cloud engine to process big data

I am an amateur to the world of Python programming and I need help. I have 10GB of data and I have written python codes with Spyder to process the data. a part of codes is provided:
The codes are good with a small sample of data. However, with 10GB of data, my laptop cannot handle it so I need to use Google Cloud Engine. How I can upload the data and use Google Cloud Engine to run codes?
import os
import pandas as pd
import pickle
import glob
import numpy as np
df=pd.read_pickle(r'C:\user\mydata.pkl')
i=2018
while i>=1995:
df=df[df.OverlapYearStart<=i]
df.to_pickle(r'C:\user\done\{}.pkl'.format(i))
i=i-1
I agree with the previous answer, just to complement it you can take a look in AI Platform Notebooks which is a managed service that offers an integrated JupyterLab environment, also has the capacity to pull your data from BigQuery and allow you to scale your application on demand.
On the other hand, I don't know how you have storage your 10GB of data into CSV? in a database? As is mentioned in the first answer Cloud Storage allows you to create buckets to store your data, once the data is in Cloud Storage you may export that data into BigQuery tables to work with that data in your app using Google Cloud App Engine or the earlier suggestion AI Platform Notebooks this will depend of your solution.
Probably the easiest thing to start digging into, is going to be to use App Engine to run the code itself:
https://cloud.google.com/appengine/docs/python/
And use Google Cloud Storage to hold your data objects:
https://cloud.google.com/storage/docs/reference/libraries#client-libraries-install-python
I don't know what the output of your application is, so depending on what you want to do with the output, Google Compute Engine may be the right answer if AppEngine doesn't quite fit what you're doing.
https://cloud.google.com/compute/
The first two links take you to the documentation on how to get going with Python for AppEngine and Google Cloud Storage.
Edit to add from comments, that you'll also need to manage the memory footprint of your app. If you're really doing everything in one giant while loop, no matter where you run the application you'll have memory problems as all 10GB of your data will likely get loaded into memory. Definitely still shift that into the Cloud IMO, but yeah, that memory will need to get broken up somehow and handled in smaller chunks.

SSIS alternatives for ETL in Azure data factory

Please could you all assist us in answering what we believe to be a rather simple question, but is proving to be really difficult to get a solution to. We have explored things like Data Bricks and Snowflake for our Azure based data warehouse, but keep getting stuck at the same point.
Do you have any info you could share with us around how you would move data from an Azure database (source) to another Azure database (destination) without using SSIS ?
We would appreciate any info you would be able to share with us on this matter.
Looking forward to hearing from you
Thanks
Dom

Archive tables in azure

I have a table storage in Azure where-in one of the tables is growing rapidly and I need to archive the tables for any data older than 90 days. Tried reading online and the only solution I can get online is to use Eventually consistent transactions pattern : https://azure.microsoft.com/en-us/documentation/articles/storage-table-design-guide/. Although the document takes an example of an employee table and can help me in achieving my objective, the intention of posting this question is to identify if there is a better solution.
Please note I am very new to Azure so might be missing a very easy step to achieve this.
Regards Tarun
The guide you are referencing is a good source. Note that if you are using Tables for logging data (has a common requirement for archiving older data) then you might want to look at blob storage instead - see the log data pattern in the guide you reference. Btw, AzCopy can also be used to export the data either to a blob or to local file system. See here for more information: https://azure.microsoft.com/en-us/documentation/articles/storage-use-azcopy/#copy-entities-in-an-azure-table-with-azcopy-preview-version-only.

Resources