Accessing kaggle datasets through its API - nlp

Im looking to work on some of the public datasets available on Kaggle.
Is it possible to access kaggle datasets through its API using RStudio??
cheers,
sup

Kaggle just launched their public API in February 2018 (so earlier this year). According to their Github repository, Kaggle/kaggle-api, Kaggle has (at least to date) made their API accessible "using a command line tool implemented in Python".
I too was excited to discover the existence of Kaggle's API, and, similarly, was also interested if someone had written an API wrapper package for interfacing with Kaggle's API in R. I couldn't find anyone who had, so I wrote one myself, which you can find here: https://github.com/mkearney/kaggler. It's literally been less than a week since I created the repo, so I can't speak to its reliability yet, but that for now it appears to be the best place to start. And, for the record, as long as people are willing to use it, I have every intention of maintaining at least minimal levels of support for the package (assuming no enthusiastic third parties step in with their own R packages), but the API itself is new so it could still be a few months before there is any real stable option.
;;
Also, although I understood exactly what you meant by your question, I think it's worth pointing out that, technically, Rstudio is an integrated development environment (IDE) and not necessarily the source/library/program that connects you to non-Rstudio web APIs. What you're actually looking for is an R extension/library/package that is designed to act as a wrapper/client/interface for Kaggle's API. Of course, this distinction is mostly trivial because if you work in Rstudio (an excellent, open-sourced R-centric IDE) to communicate with external API's like Kaggle's, then you are "access[ing] kaggle datasets through its API using RStudio". But for the sake of giving credit where credit is due, it'd be more accurate to say that you are hoping to leverage {Rstudio} and the {pkgname} package to communicate with Kaggle's API via the R environment.

Related

How can I schedule a YouTube livestream entirely from Linux?

I have a setup on a Raspberry Pi (with its native camera) that uses a cronjob to start an ffmpeg session with its output streaming to YouTube. I re-use the same stream key each time, which is written into my ffmpeg scripts. This all works perfectly each week, automatically starting and stopping at the desired time.
However, each week PRIOR to that livestream, I have to "manually" go into YouTube Studio and "schedule" a new future event. This is easy enough, since it lets me "reuse" previous settings -- all I have to change is the Title, date, and time. But I would love to figure out a way to automate that part of the process, as well. I assume it involves using the YouTube Data API, but I'm not well versed in API's, JSON, etc.
(I do have a strong Linux background, bash scripting skills, and general programming background.)
My final solution just needs to:
create the new scheduled event (maybe 12 hours prior to going live), with Title, Date, Time, "Unlisted" status, category, and so forth -- all the usual settings I do manually within Studio
retrieve the assigned URL for the upcoming stream (my script will then email that to me)
So, basically, I'm asking for help getting started with the API, or whatever method is capable of doing this. I would prefer to code it on the same Pi that does the ffmpeg encoding (although in a pinch, I could create the schedule from another computer, even Windows). Any examples would be great.
So far, all I have done is create my Google project, enable the YouTube Data API in the project, and create my API key. But I'm not sure where to go from there.
If Python as implementation language suites you, then I'd recommend to use the Google's APIs Client Library for Python.
Basically, this library is of good quality and (compared to other client libraries) simple to use. It will, for example, insulate you from having to deal explicitly with REST API calls, JSON and the like. Your code will also work under both GNU/Linux and Windows.
You may begin your journey by reading the official getting started doc: YouTube Live Streaming API Overview. Then I recommend absorbing these two important documents: Life of a Broadcast and Understanding Broadcasts and Streams.
Then go read, understand and run the following sample program from Google: create_broadcast.py. Of course, you'll have to adapt that code to your use case.
You'll have to exercise patience and perseverance (since you say you have no prior experience using the YouTube Data API). This API is non-trivial, but it'll pay off to you at the end of your (programming) journey (you mentioned to be versed in programming).
A special mention: for to be able to call the live streaming APIs you will first need to get acquainted with the things related to the so-called OAuth 2.0 authorization and authentication: Implementing OAuth 2.0 Authentication. There's an official document that you need absorb: OAuth 2.0 for Mobile & Desktop Apps.
A few more references: the live streaming API has an official documentation too. The main site documenting the client library is: Google API Client Library for Python Docs. Its source is public, to be found within the client library's public repo under the directory docs.
Also useful is to see the YouTube Data API's list of all instance methods.

OpenAI API and GPT-3, not clear how can I access or set up a learning/dev?

I am reading tons of GPT-3 samples, and came cross many code samples.
None of them mentions that how and where I can run and play with the code myself... and especially not mentioning I can not.
So I did my research, and concluded, I can not, but I may be wrong:
There is no way to run the "thing" on-premises on a dev machine, it is a hosted service by definition (?)
As of now (Oct. 11th 2020) the OpenAI API is in invite only beta (?)
Did I miss something?
As of now (Oct. 11th 2020) the OpenAI API is in invite only beta (?)
Yes, that is correct! Check out their API introduction and application form. On there you'll find the necessary information to apply (in case you want that).
There is no way to run the "thing" on-premises on a dev machine, it is a hosted service by definition
Well, you have access to their API. You either can use the inbuilt playground or access the API via a HTTP request (and therefore via most programming languages). But there isn't much coding to be done as you only have a few parameters to pass into the request - e.g. the amount of tokens, temperature, etc.
They even have a function which writes the code for you (though that's probably not necessary):
As you can see, you're able to write (and test) your scripts with the different settings and engines. Once you set it up properly, you may simply export the code. That obviously shouldn't be used in production as your software is (presumably) more in-depth then just one standardized call.
As of Nov 18 2021, the API is now available publicly with no waitlist. You just need to sign up for an account and get an API key (instant access). You need to pay per usage, but you get some free usage to start with.
Pricing for Da Vinci (the main/best GPT-3 model) is $0.006 per 1000 tokens (approx 750 words). So for about 1 cent, you can get it to read and/or write about 1500 words.
https://openai.com/blog/api-no-waitlist/

Python curses interface

I have developed a program using curses, everything is cool so far but I was wondering myself if there is a good pattern to split different views/panels of my program into smaller chunks callable by my main loop?
Further informations:
This program is a rather small automation tool/wizard aiming to ease our application for customers requiring the on-premise installation.
This wizard is a 3 steps one and it’s used to grab informations from our customer installation needs depending of it’s chosen architecture.
The first step is requesting the customer to give us its identification informations such has contract number, company name, licence number and preferred contact.
The second step is requesting the customer to give us informations about either he want a standalone installation (All-In-One install) or a N-Tiers installation plus the required informations like the requested custom SSL VHostName or Tiers IP/Credentials.
The third and final step is showing the customer a progress bar and informations of the required services (MySQL/HTTPd/HAProxy/PHP-FPM) and our application.
I know that I do not especially need to use curses library for such a program but our UX Team requested it as it is part of our customer experience with the solution.
You can look at the Forms library. It's a nice extension to ncurses that allows you to better manage input forms like yours. It offers a simple function interface to read the fields, change their properties, etc., as well as many different field types (including regexp-validated fields). In your case, you can simply create three forms, and post/unpost them in succession.
as such way to do things is not really usual, do not expect any framework to be available (like those available for WebUI for instance).
I so decided to create my own "Framework/factory" and so to be able to split every aspect of my app in a logic that would be similar of those used by web applications.
The source-code is dirty and really not pythonic, but it is well working so far and quite easy to maintain.
Thanks everyone for your answers and ideas.

Need guidance back into programming

I used to be a programmer and unix sysadmin back in the 90's and early 00's. I wrote business software mostly in BBX, which was non-compiled, procedural BASIC. It was all text based when I started, and I only just got into GUI and OOP with ProvideX by the time I got out. I did do some SQL work and understand basic database concepts.
I've continually dabbled since and tried to keep up by running my own Debian web server here at the house, doing little script programs here and there, and most recently learning PHP and Python. But I would like to get versed in the current state of the industry and hopefully make myself employable in it again.
My current learning project is to write a db app that I can use when drag racing to log run data, report based on various combinations of variables, and predict vehicle performance. This should cover IO, data management, and some complex math. I do want to make is sellable, so it has to be in Windows since all other racing software is. My two options now are to write it in MSAccess, which isn't really programming, or to write a front end in Python and use MySQL for the data.
I assume I should go the Python path out of those two, or should I choose a third path that would pay more dividends toward a job? My biggest concern is wasting my time learning pointless stuff. I assume most of the work out there is db related and web based applications, so that would be my ultimate goal. Correct me if I am wrong on that.
Thanks for any input,
Dave
If your goal is to get back into software development, then I recommend that you first ask yourself what type of industry and development setting you'd like to work in. Learn something about the skills those industries are demanding... Then hit Monster and peruse the job qualifications for companies in those industries. Don't limit your view to just language names and broad job descriptions either, but really try to get an idea what sort of developer they're looking for and whether you'd fit in well.
You will be able to find many interesting technologies in lots of different business domains, but what do you really want to be working to help deliver? Python coding, for example, may be interesting, but I'm sure you'd be more interested if it were supporting your motorsport interest in some way versus, say, baby food. When you have the business domains narrowed down, then you can focus on the background required to get jobs in those industries.
You will find an endless set of recommended "hot" techologies if you search for them. I'm sure you can find a list, or post, which will confirm any bias you have on what to learn. But chasing the technology of the day may lead to an unfulfilling day-to-day job if what you're applying it to is not something you find interesting.
I would say that the answer depends on what type of job you want to do. The Fortune 500 company I worked at last summer had everything from mainframe c and cobol, java EE, .net to ruby on rails and python in applications. There are still alot of jobs maintaining legacy desktop applications. But the web atmosphere is obviously the future of business computing, and java EE and .NET are huge players in that arena. As for the project you are describing. I've done QT applications with python and there are python libraries for GTK that I've seen used to run apps in Windows. I've also used java swing and awt to build graphical applications and other than the learning curve for the layout system it works really well for building applications. I wrote a really basic windows application using visual studio and C# one time and that seemed to me to be very easy to write.
Enterprise level Java or .NET involve a fairly steep learning curve, so I would have those as a medium-long term goal rather than try and learn that tech immediately.
It seems to me that learning a high productivity web framework is the best way for you to go. "Ruby on Rails" seems to be a hot ticket at the moment. I've only had a small look at it, but it seems pretty quick and straightforward. Your drag racing app would be a good place to start.
Build a couple of websites for yourself using the tech. Then build a couple of websites for friends for a nominal fee. After that, see if you can find a real client (perhaps a local business). If you have 2 or 3 of those under your belt, then a potential employer will at least take notice.
One warning, though - people expect web sites to look nice. If you don't have good interface design skills yourself, it will be in your best interest to hire a designer to pretty up whatever you produce.
For a Windows desktop application, you can use C# and the various .NET APIs, and store your data in either a Microsoft-provided database, or SQLite, which is a reliable, server-free SQL implementation. (I don't know anything about Microsoft tech, hence the vagueness of my answer.) There is a lot of work available using C# and .NET, and it should be easy to pick up. You'll meet less resistance on the Windows platform with Microsoft's kit than with third-party languages like Python.

Domain repository for requirements management - build or buy?

In my organisation, we have some very inefficient processes around managing requirements, tracking what was actually delivered on what versions, etc, do subsequent releases break previous functionality, etc - its currently all managed manually. The requirements are spread over several documents and issue trackers, and the implementation details is in code in subversion, Jira, TestLink. I'm trying to put together a system that consolidates the requirements info, so that it is sourced from a single, authoritative source, is accessible via standard interfaces - web services, browsers, etc, and can be automatically validated against. The actual domain knowledge is not that complicated but is highly proprietary and non-standard (i.e., not just customers with addresses, emails, etc), and is relational: customers have certain functionalities, features switched on/off, specific datasources hooked up - all on specific versions. So modelling this should be straightforward.
Can anyone advise the best approach for this - I a certain that I can develop a system from scratch that matches exactly the requirements, in say ruby on rails, grails, or some RAD framework. But I'm having difficulty getting management buy-in, they would feel safer with an off the shelf solution.
Can anyone recommend such a system? Or am I better off building it from scratch, as I feel I am? I'm afraid a bought system would take just as long to deploy, and would not meet our requirements.
Thanks for any advice.
I believe that you are describing two different problems. The first is getting everyone to standardize and the second is selecting a good tool for requirements management. I wouldn't worry so much about the tool as I would the process and the people. Having the best tool in the world won't help if your various project managers don't want to share.
So, my suggestion is to start simple. Grab Redmine or Trac and take on the challenge of getting everyone to standardize. Once you have everyone in the right mindset then you can improve the tools you use for storage.
{disclaimer - mentioning my employer's product}
The brief experiments I made with a commercial tool RequisitePro seemed pretty good me. Allowed one to annotate existing Word docs and create a real-time linked database of the identified requisistes then perform lots of analysis and tracking of them.
Sometimes when I see a commercial product I think "Oh, well nice glossy bits but the fundamentals I could knock up in Perl in a weekend." That's not the case with this stuff. I would certainly look at commercial products in this space and exeperiment with a couple (ReqPro has a free trial, I guess the competition will too) before spending time on my own development.
Thanks a mill for the reply. I will take a look at RequisitePro, at least I'll be following the "Nobody ever got fired for buying IBM" strategy ;) youre right, and I kinda knew it, in these situations, buy is better. It is tempting when I can visualise throwing it together quickly, but theres other tradeoffs and risks with that approach.
Thanks,
Justin
While Requisite Pro enforces a standard and that can certainly help you in your task, I'd certainly second Mark on trying to standardize the input by agreement with personnel and using a more flexible tool like Trac, Redmine (which both have incredibly fast deploy and setup times, especially if you host them from a VM) or even a custom one if you can get the management to endorse your project.

Resources