Node.js NTLM HTTP Authentication, how to handle the 3 types - node.js

I'm trying to get NTLM Authentication working w/ Node.js. I've been reading this ( http://davenport.sourceforge.net/ntlm.html#theNtlmMessageHeaderLayout ). I send the header and get a Base64 authentication header.
I tried converting it from Base64 to UTF8 by making a new Buffer with base64 encoding and then calling toString('utf8') which returns a string something like
NTLMSSP\u0000\u0001\u0000\u0000\u0000\u0007�\b�\u0000
This is where I need help. I understand the NTLMSSP\u0000 is the null terminated signature, but and what the rest is supposed to indicate, but to me it's just garbage. It's unicode characters, but how am I supposed to get actual data out of that? I may be converting it incorrectly, which may be adding to my troubles, but I'm hoping someone can help.

Have a look at http://www.innovation.ch/personal/ronald/ntlm.html
What you receive is a Type-2 Message. The pages explains it in a very practical way. You have to extract the server challenge (nonce) and the server flags.
I just implemented a module for node.js to do just that: https://github.com/SamDecrock/node-http-ntlm

Have you looked at NTLMAPS?
You may be able to solve your problem by using it as a proxy server, but if you really want to implement NTLM auth in Javascript, then NTLMAPS provides lots of working code to study.

Sam posted the best resource I've seen for understanding what's going on.
jclulow on GitHub seems to have implemented it in a Samba library he built.
Take a look here:
https://github.com/jclulow/node-smbhash
under lib\ntlm.js you can see how he's handled the responses.

I've built client a couple of months ago using javascript, ntlm.js. Maybe that can help you get along. It was based on the documentation # innovation.ch and Microsofts own official documentation (see the references on the github page).

Related

Tracking Discord with GA4

Bots are amazing, unless you're Google Analytics
After many months of learning to host my own Discord bot, I finally figured it out! I now have a node server running on my localhost that sends and receives data from my Discord server; it works great. I can do all kinds of the things I want to with my Discord bot.
Given that I work with analytics everyday, one project I want to figure out is how to send data to Google Analytics (specifically GA4) from this node server.
NOTE: I have had success in sending data to my Universal Analytics property. However, as awesome as that was to finally see pageviews coming into, it was equally heartbreaking to recall that Google will be getting rid of Universal Analytics in July of this year.
I have tried the following options:
GET/POST requests to the collect endpoint
This option presented itself as impossible from the get-go. In order to send a request to the collection endpoint, a client_id must be sent along with the request itself. And this client_id is something that must be generated using Google's client id algorithm. So, I can't just make one up.
If you consider this option possible, please let me know why.
Install googleapis npm package
At first, I thought I could just install the googleapis package and be ready to go, but that idea fell on its face immediately too. With this package, I can't send data to GA, I can only read with it.
Find and install a GTM npm package
There are GTM npm packages out there, but I quickly found out that they all require there to be a window object, which is something my node server would not have because it isn't a browser.
How I did this for Universal Analytics
My biggest goal is to do this without using Python, Java, C++ or any other low level languages. Because, that route would require me to learn new languages. Surely it's possible with NodeJS alone... no?
I eventually stumbled upon the idea of actually hosting a webpage as some sort of pseudo-proxy that would send data from the page to GA when accessed by something like a page scraper. It was simple. I created an HTML file that has Google Tag Manager installed on it, and all I had to do was use the puppeteer npm package.
It isn't perfect, but it works and I can use Google Tag Manager to handle and manipulate input, which is wonderful.
Unfortunately, this same method will not work for GA4 because GA4 automatically excludes all identified bot traffic automatically, and there is no way to turn that setting off. It is a very useful feature for GA4, giving it quite a bit more integrity than UA, and I'm not trying to get around that fact, but it is now the Bane of my entire goal.
https://support.google.com/analytics/answer/9888366?hl=en
Where to go from here?
I'm nearly at the end of my wits on figuring this one out. So, either an npm package exists out there that I haven't found yet, or this is a futile project.
Does anyone have any experience in sending data from NodeJS to GA4? (or even GTM?) How did you do it?
...and this client_id is something that must be generated using Google's client id algorithm. So, I can't just make one up...
Why, of course you can. GA4 generates it pretty much the same as UA does. You don't need anything from google to do it.
Besides, instead of mimicking just requests to the collect endpoint, you may just wanna go the MP route right away: https://developers.google.com/analytics/devguides/collection/protocol/ga4 The links #dockeryZ gave, work perfectly fine. Maybe try opening them in incognito, or in a different browser? Maybe you have a plugin blocking analytics urls.
Moreover, you don't really need to reinvent the bicycle. Node already has a few packages to send events to GA4, here's one looking good: https://www.npmjs.com/package/ga4-mp?activeTab=readme
Or you can just use gtag directly to send events. I see a lot of people doing it even on the front-end: https://www.npmjs.com/package/ga-gtag Gtag has a whole api not described in there. Here's more on gtag: https://developers.google.com/tag-platform/gtagjs/reference Note how the library allows you to set the client id there.
The only caveat there is that you'll have to track client ids and session ids manually. Shouldn't be too bad though. Oh, and you will have to redefine the concept of a pageview, I guess. Well, the obvious one is whenever people post in the chan that is different from the previous post in a session. Still, this will have to be defined in the code.
Don't worry about google's bot traffic detection. It's really primitive. Just make sure your useragent doesn't scream "bot" in it. Make something better up.

Save column of numbers from url to python array

I want to save the data column from this url to an array in python. I tried it with, for instance, pandas.save_table:
import pandas as pd
pd.read_table('https://adventofcode.com/2019/day/1/input', sep='')
but I get HTTPError: HTTP Error 400: Bad Request and I think this is not the right way to do that.
Can someone help me with that?
If you try to open the link in your question (in a browser using incognito mode or something similar i.e. delete your cookies) you'll see that you need login into the website to access the page. This is why the you're getting a 400 Bad Request error as a response from the server.
From the FAQ section of the website that you're trying to access:
How does authentication work? Advent of Code uses OAuth to confirm
your identity through other services. When you log in, you only ever
give your credentials to that service - never to Advent of Code. Then,
the service you use tells the Advent of Code servers that you're
really you. In general, this reveals no information about you beyond
what is already public; here are examples from Reddit and GitHub.
Advent of Code will remember your unique ID, names, URL, and image
from the service you use to authenticate.
The website uses OAuth to handle logins to the url that you create will need these access tokens. You can use a library like python-oauth2 to help you with this (there are others so you can read around and decide which you'd like to use). Creating and understanding how to make http requests is beyond the scope of this answer. I'd suggest you have a look around on the internet for some explanations and try again, if you have get stuck please ask another question. Otherwise it'll probably be easier to save the file from your browser...But I'll leave this answer here for the next person who runs into the same problem.

How HTTP response is generated

I'm fairly new to programming and this question is about making sure I get the HTTP protocol correctly. My issue is that when I read about HTTP request/response, it looks like it needs to be in a very specific format with a status code, HTTP version number, headers, a blank line followed by the body.
However, after creating a web app with nodejs/express, I never once had to actually write code that made an HTTP response in this format (I'm assuming, although I don't know for sure that other frameworks like ruby on rails or python/Django are the same). In the express app, I just set up the route handlers to render the appropriate pages, when a request was made to that route.
Is this because express is actually putting the response in the correct HTTP format behind the scenes? In other words, if I looked at the expressJS code, would there be something in that code that actually makes an HTTP response in the HTTP format?
My confusion is that, it seems like the HTTP request/response format is so important but somehow I never had to write any code dealing with it for a node/express application. Maybe this is the entire point of a framework like express... to take out the details so that developers can deal with business logic. And if that is correct, does anyone ever write web apps without a framework to do this. Would you then be responsible for writing code that puts the server's response into the exact HTTP format?
I'm fairly new to programming and this question is about making sure I get the HTTP protocol correctly. My issue is that when I read about HTTP request/response, it looks like it needs to be in a very specific format with a status code, HTTP version number, headers, a blank line followed by the body.
Just to give you an idea, there are probably hundreds of specifications that have something to do with the HTTP protocol. They deal with not only the protocol itself, but also with the data format/encoding for everything you send including headers and all the various content types you can send, authentication schemes, caching, status codes, URL decoding, etc.... You can see some of the specifications involved just by looking here: https://www.w3.org/Protocols/.
Now a simple request and a simple text response could get away with only knowing a few of these specifications, but life is not always that simple.
Is this because express is actually putting the response in the correct HTTP format behind the scenes? In other words, if I looked at the expressJS code, would there be something in that code that actually makes an HTTP response in the HTTP format?
Yes, there would. A combination of Express and the HTTP library that is built into node.js handle all the details of the specification for you. That's the advantage of using a library/framework. They even handle different versions of the protocol and feedback from thousands of other developers have helped them to clean up edge case bugs. A good library/framework allows you to still control any detail about the response (headers, content types, status codes, etc..) without making you have to go through the detail work of actually creating the exact response. This is a good thing. It lets you write code faster and lets you ride on the shoulders of others who have already figured out minutiae details that have nothing to do with the logic of your app.
In fact, one could say the same about the TCP protocol below the HTTP protocol. No regular app developer wants to write their own TCP stack. Instead, you just want a working TCP stack that you can use that's already been tuned and debugged for you.
However, after creating a web app with nodejs/express, I never once had to actually write code that made an HTTP response in this format (I'm assuming, although I don't know for sure that other frameworks like ruby on rails or python/Django are the same). In the express app, I just set up the route handlers to render the appropriate pages, when a request was made to that route.
Yes, this is a good thing. The framework did the detail work for you. You just call res.setHeader(), res.status(), res.cookie(), res.send(), res.json(), etc... and Express makes the entire response for you.
And if that is correct, does anyone ever write web apps without a framework to do this. Would you then be responsible for writing code that puts the server's response into the exact HTTP format?
If you didn't use a framework or library of any kind and were programming at the raw TCP level, then yes you would be responsible for all the details of the HTTP protocol. But, hardly anybody other than library developers ever does this because frankly it's just a waste of time. Every single platform has at least one open source library that does this already and even if you were working on a brand new platform, you could go get an open source body of code and port it to your platform much quicker than you could write all this yourself.
Keep in mind that one of the HUGE advantages of node.js is that there's an enormous body of open source code (mostly in NPM and Github) already prepackaged to work with node.js. And, because node.js is server-side where code memory isn't usually tight and where code just comes from the local hard disk at server init time, there's little downside to grabbing a working and tested package that does what you already need, even if you're only going to use 5% of the functionality in the package. Or, worst case, clone an existing repository and modify it to perfectly suit your needs.
Is this because express is actually putting the response in the
correct HTTP format behind the scenes?
Yes, exactly, HTTP is so ubiquitous that almost all programming languages / frameworks handle the actual writing and parsing of HTTP behind the scenes.
Does anyone ever write web apps without a framework to do this. Would
you then be responsible for writing code that puts the server's
response into the exact HTTP format?
Never (unless you're writing code that needs very low level tweaking of HTTP code or something)

Getting all GitHub users through github-api

The GitHub API documentation says that the url
https://api.github.com/users
will give all users in the order they signed up, but I only seem to get the first 135.
Any ideas how to get the real full list?
Please use since parameter in your GET request.
https://api.github.com/users?since=XXX
Probably it's done this way to limit the resources needed to handle such request. Without such limit it's just asking for DoS attack.
If you check the response headers for that request Github provides pagination links under the header Links
Link: <https://api.github.com/users?since=135>; rel="next", <https://api.github.com/users{?since}>; rel="first"
I believe since their api v3 Github has been moving towards a hypermedia api.
Github Hypermedia API
EDIT
This is beyond the scope of this question but its related. To learn more about hypermedia API and REST. Take a look at these slides by Steve Klabnik
http://steveklabnik.github.com/hypermedia-presentation/#1
Both of the existing answers are 100% correct, but I would advise you to use a wrapper for whatever language you happen to be doing this in. There are plenty of them and there is an official one for ruby (Octokit). Here is a list of all of them.
You can filter on type:user like this:
https://api.github.com/search/users?q=type:user
See Also: GitHub API get total number of users/organizations

Getting XAuth to work with Node.JS & OAuth

I'm building my first node.js app, and trying to integrate it with the Instapaper API. Instapaper only uses XAuth, and I'm a little confused how it works.
Should I use the node-oauth module and try to customize it for xauth? Or should I roll my own xauth? Or something else, maybe?
I'd checkout something like EveryAuth, see how they are handling the various options out there, forking it, and then contributing back with a new implementation.
Good luck man.
Here is how to get it working with the oauth module.
https://stackoverflow.com/a/9645033/186101
You'd still need to handle storing the tokens for your users, but that answer should get you u and running for testing the API in node.

Resources