Is it legal to web scrape a reddit subreddit for images?

Is it legal to web scrape a reddit subreddit for images? - node.js

I'm making a /meme {text} command for my discord bot. For this, I need a source for those memes. The reddit subreddit r/memes seems to be a good one. So I used a package to get the webpage of the search results for {text} then extract the top image (using JQuery through another package. It worked, but I wanna know whether it's legal, if I mention the source of the images/memes and a link as well. I can't ask on reddit due to my account being new.
Packages used:
https://www.npmjs.com/package/request-promise
https://www.npmjs.com/package/cheerio

StackOverflow is not so much about legal advice. But since the essentials of copyright and authorship laws are important for programmers, too, I think an answer is in order:
Is it legal for you to mass-download readily available resources from the Internet? Yes.
May you redistribute the material you downloaded? That depends on the copyright claims put by the original author on the material you downloaded. In case of doubt, err on the safe side and assume: No!
Furthermore scraping a site like Reddit with an automated script may violate their terms of service and it's perfectly legal for them to block your bot (i.e. kick you out of their house). If you try to circumvent their block, technically you're "trespassing" and they may take legal action.
Note that the specific laws governing this are specific to each country, so it doesn't really make sense to point to the specific laws in question. But in general these rules in some form apply everywhere on the world.

Related

Tools for Domain Specific Language/Functions

Our users can enter questions that get answered by students. Our users need a extensible, flexible way to define the correct answers to these questions (which are stored as a simple string).
I would like to expose a library of domain specific functions that users can call on to describe the correct answer. Eg:
exact_match("puppy") // means the correct answer is the string 'puppy'
or
contains("yesterday") // means any answer with the word 'yesterday' is correct
The naive implementation would involve eval'ing user supplied strings in a sandboxed runtime (like a javascript vm or ruby vm). But I'd like to go further and only allow specific functions to be called. Any other scripting would be discarded. Such that:
puts("foo"); contains("yesterday")
would be illegal. Since we don't expose or allow puts().
How can I constrain the execution environment to only run a whitelist of functions? Or is there a different approach to build this kind of external-facing DSL instead of trying to constrain an existing language to a subset of functions?

I would check out MPS by JetBrains if I were you, its an open source DSL creation tool. I have never used it myself, but from everything I have seen on it, it's very intuitive; and all of their other products are incredibly powerful.

Just because you're creating a DSL, that doesn't necessarily mean that you have to give the user the ability to enter the code in text.
The key to this is providing a list of method names and your special keyword for them, the "FunCode" tag in the code example below:
Create a mapping from keyword to code, and letting them define everything they need, and then use it. And I would actually build my own XML parser so that it's not hackable, at least not on a list of zero-day-exploits hackable.
<strDefs>
<strDef><strNam>sickStr</strNam>
<strText>sick</strText><strNum>01</strNum><strDef>
<strDef><strNam>pupStr</strNam>
<strText>puppy</strText><strNum>02</strNum><strDef>
</strDefs>
<funDefs>
<funDef><funCode>pfContainsStr</funCode><funLabel>contains</funLabel>
<funNum>01</funNum></funDef>
<funDef><funCode>pfXact</funCode><funLabel>exact_match</funLabel>
<funNum>02</funNum></funDef>
</funDefs>
<queries>
<query><fun>01</fun><str>02</str>
</query>
</queries>
The above XML more represents the idea and the structure of what to do, but rather in a user interface, so the user is constrained. The user interface code that allows the data-entry of the above data should be running on your server, and they only interact with it. Any code that runs on their browser is hackable, because they can just save the page, edit the HTML (and/or JavaScript), and run that, which is their code now, not yours anymore.
You can't really open the door (pandora's box) and allow just anyone to write just any code and have it evaluated / interpreted by the language parser, because some hacker is going to exploit it. You must lock down the strings, probably by having them enter them into your database in an earlier step, and each string gets its own token that YOU generate (a SQL Server primary key is very simple, usable, and secure), but give them a display representation so it's readable to them.
Then give them a list of methods / functions they can use, along with a token (a primary key can also serve here, perhaps with a kind of table prefix) and also a display representation (label).
If you have them put all of their labels into yet another table, you can have SQL make sure that all of their labels are unique to each other in the whole "language", and then you can allow them to try to define their expressions in the language they want to use. This has the advantage that foreign languages can be used, but you don't have to do anything terribly special.
An important piece would be the verify button, that would translate their expression into unique tokens and back again, checking that the round-trip was successful. If it wasn't successful, there's some kind of ambiguity, and you might be able to allow them an option to use the list of tokens as the source in that case.
If you heavily rely on set-based logic for the underlying foundation of the language and your tables, you should be able to produce a coherent DSL that works. Many DSL creation problems are ones of integrity, where there are underlying assumptions that are contradictory, unintentionally mutually exclusive, or nonsensical. Truth is an unshakeable foundation. Anything else has a lie somewhere -- that you're trying to build on.
Sudoku is illustrative here. When you screw up a Sudoku, you often don't know that you have done so, and you keep building on that false foundation, until you get to the completion of the puzzle, and one whole string of assumptions disagrees with a different string of assumptions. They can't both be true. But you can't tell where you went wrong because you're too far away from the mistake and can not work backwards (easily). All steps taken look correct. A DSL, a database schema, and code, are all this way. Baby steps, that are double- and even triple-checked, and hopefully "correct by inspection", are the best way to "grow" a DSL, slowly, piece-by-piece. The best way to not have flaws is to not add them in the first place.
You don't want bugs in your DSL. Keep it spartan. KISS - Keep it simple, Sparticus! And I have personally found that keeping it set-based, if not overtly, under the covers, accomplishes this very well.
Finally, to be able to think this way, I've studied languages for a long time, and have cultivated a curiosity about how languages have come to be. Books are a good quality source of information, as they have a higher quality level than the internet, which is nevertheless also an indispensable source. Some of my favorite languages: Forth, Factor, SETL, F#, C#, Visual FoxPro (especially for its embedded SQL), T-SQL, Common LISP, Clojure, and probably my favorite, Dylan, an INFIX Lisp without parentheses that Apple experimented with and abandoned, with a syntax that seems to me reminiscent of Pascal, which I sort of liked. The language list is actually much longer than that (and I haven't written code for many of them -- just studied them or their genesis), but that's enough for now.
One of my favorite books, and immensely interesting for the "people" side of it, is "Masterminds of Programming: Conversations with the Creators of Major Programming Languages" (Theory in Practice (O'Reilly)) 1st Edition, Kindle Edition
by Federico Biancuzzi (Author), Chromatic (Author)
By the way, don't let them compromise the integrity of your DSL -- require that it is expressible set-based, and things should go well (IMHO). I hope it works out well for you. Add a comment to my answer telling me how it worked out, if you think of it. And don't forget to choose my answer if you think it's the best! We work hard for the money! ;-)

Using built in functions

I am developing a Windows Form Application in C#.I have heard that one should not use built in methods and functions in code since hackers have deep understanding of such built in methods and know how to fail them Instead one should always use his/her own functions and methods and if not then call built in functions intelligently from those newly made functions.How much is that true?
A supporting example in favour of my argument is that I have seen developer always develope there own made encryption algorithm like AES,DES,RC4 and Hash functions since they believe that built in encryption algorithm have many times backdoor in them.

What?! No, no, no! Whoever told you this is just wrong.
There is a common fallacy that published source code is more vulnerable to "h4ckerz" because it is available for anyone to spot the flaws in. However, I'm glad you mentioned crypto, because this is an area where this line of reasoning really stands out as the fallacy it is.
One of the most popular questions of all time on https://security.stackexchange.com/ is about a developer (in the OP he was given the pseudonym "Dave") who shared this fear of published code. Dave, like the developer you saw, was trying to homebrew his own encryption algorithm. Here's one of the most popular comments in that thread:
Dave has a fundamentally false premise, that the security of an algorithm relies on (even partially) its obscurity - that's not the case. The security of a hashing algorithm relies on the limits of our understanding of mathematics, and, to a lesser extent, the hardware ability to brute-force it. Once Dave accepts this reality (and it really is reality, read the Wikipedia article on hashing), it's a question of who is smarter - Dave by himself, or a large group of specialists devoted to this very particular problem. (emphasis added)
As a matter of fact, as it stands now, the top two memes on Security.SE are "Don't roll your own" and "Don't be a Dave".
While this has all been about crypto, this applies in general to most open-source software. The chance that a backdoor will get found and fixed goes up with each new set of eyes laid on the code. This should be a simple and uncontroversial premise: the more people are looking for something, the higher the chance it will be found. Yes, this applies to malicious users looking for exploits. However, it also applies to power users, white hat hackers, security researchers, cryptographers, professional developers, and others working for "good", which generally (hopefully) outnumber those working for "evil". This also implicitly relies on the false premise that hackers need to see the source code to do bad things. This should be obviously false based on the sheer number of proprietary systems whose source code has never been published (various Microsoft and Adobe programs come to mind) which have been inundated with vulnerabilities for years. Maybe having source code to read makes the hacker's job easier, but maybe not -- is it easier to pore over source code looking for an attack vector or to just use scanning tools and scripts against a compiled binary?
tl;dr Don't be a Dave. Rolling your own means you have to be the best at everything to succeed, instead of taking a sampling of the best the community has to offer.
Heartbleed
In your comment, you rebut:
Then why was the Heartbleed bug in openSSL not found and corrected [earlier]?
Because no one was looking at it. That's the sad truth. Here's the difference -- what happened once someone did find it? Now tens of thousands of security researchers, crypto experts, and others are looking at it. Suppose the same kind of vulnerability existed in one of the proprietary products I mentioned earlier, which it very well could. Once it's caught (if it's caught), ask yourself:
Could the team of programmers at the company responsible benefit from the help of the entire worldwide community of security experts, cryptographers, and other analysts right now?
If a bug this critical were discovered (and that's a big if!) in your software, would you be prepared to deal with the fallout caused by your custom implementation?

Unless you know of specific failure modes or weaknesses of the built-in methods your application would use and know how to minimize or eliminate them, it is probably better to use the methods provided by the language or library designers, which will often be both more efficient and more secure than what an average programmer would come up with on the fly for a particular project.
Your example absolutely does not support your view: developing your own encryption algorithm without some serious background in the domain and review by cryptanalysts, and then employing it in security-critical code, is a recipe for disaster. Even developing your own custom implementation of an industry standard encryption algorithm can present problems, and almost certainly will if you are inexperienced at it.

Is it legal to decompile an APK and use part of its code in your app

Is it legal to decompile an apk and use part of it's code? (more specifically: a URL connector (I haven't learned that yet)).
The rest of the app (layouts and such) is made by me. Can I publish this app without being concerned on the legal front?

Decompiling is absolutely LEGAL, regardless of what the shills say. At most, you can be sued for unauthorized activity relating to software unless you're redistributing it. Courts in the U.S. have always upheld the right of users to know exactly what code is being installed on their systems by programs they have legitimately obtained.
People REALLY need to quit saying "ILLEGAL" unless they know what they're talking about. There is absolutely NO law in the U.S. that states you cannot copy for private purposes or decompile software. Companies have tried to sue to stop it, but; a) that's only civil, not criminal, and therefore not ILLEGAL; and b) they've only won when the content was given to an outside party from whom the companies did not receive payment. IE the person has been shown to break the law.

It depends on the application license but in general, if decompilation is necessary that means that the author does not allow the use of its code.

If you want to know in simple words"It absolutely LEGAL decompiling apks,but it is ILLEGAL to use that code in making something identical to the original one".
You can learn from the code but who can't copy there work..(It sometimes depends on publishers Copyrights).

TL;DR No.
You can decompile it but you can't use the code.

Decompiling is both illegal and wrong, unless it's your own work.
You can learn what you need on Google, or find open-source stuff using it and learn from that.
It's illegal to decompile ANYTHING without permission.

Documents should be written by the skilled programmers?

I always think document is really important for a project and a team and should be written regularly and detailedly. It can make things go in parallel without always asking the skilled programmers here and there. But truly I find many developers(even leaders) don't put so much attention to documents and just take them for granted, which makes me feel bad.
So is my attitude to documents right? Are documents are really important? How should I persuade the team leader to put more attention to the documents?
If documents are important, the second question springs up. Who should write the documents? IMO, they should be written by the skilled programmers like the framework creator(if we use our own framework), the important parts of project(like db schema, the whole architecture, etc.) and more.
The benefits are obvious, like helping fresh man, help maintaining and more.
So from my opinion, the skilled programmers(the definition here may be different) should pay more attention to the documents writing than code writing after the infrastructure is done.
Am I right about this point?
Thanks for your sharing about these questions.

You have several kinds of documentation, one of them is your responsibility:
Document each function, class, structure, member as you complete it
Ideally, you do this in a way that permits automatic extraction of source documentation (e.g. Doxygen). Just be sure to do it as you go.
As far as customer documentation goes, my beliefs are:
Every development firm should employ testers
Testers should contribute heavily to the documentation process
I've been with companies that simply will not pay in full for the final product unless it ships with complete and comprehensive documentation. 10% is usually held back just to ensure that the contractor has incentive to deliver all materials.
As far as testers go, they are really your best friend (or should be). They are the people who know how your software works almost as well as you do. And yes, I agree, you should have at least an outline of a programs functionality, this keeps you from going off on 'value add' tangents. It just makes sense to let the testers be the one to fill this in, then have the developers review it for accuracy.
You may even find your self saying "No no no .. it doesn't work that way .. the testers got this wrong ...", then you fire up the app to realize that they got it right :) In that aspect, it's also helpful to the QA process.

I think it is up to each team, but many times, programmers aren't skilled in writing documentation. That is why there are people like Technical Writers. The programmers should be involved in every step since they are the subject matter experts, but the writers should write.

Your group should create a software development process that defines how you go about developing your software products. Part of that process would define the documents to be written, and in my experience, all members of the development team share in the documentation process -- It can (and should) be a learning process.
Your software development process should define other topics as well, such as code reviews, unit testing, configuration management, etc.
There are lots of examples of processes, from very light to very heavy on the web.

The important questions to ask about any potential documentation is what is the goal, intended usage, and expected frequency of usage of the documentation? You mention helping a fresh man, but in practice, is reading the documentation faster and more efficient then getting a walkthrough from another developer? A walkthrough takes time from the other developer, but likely far less time then writing a document would.
Documentation with a strong business case and ROI over alternative options makes sense to create, but there are likely less cases of that then you imagine, and creating documentation without having a clear answer to my initial questions will guarantee you don't get the ROI for it.

You're not wrong, but fact is that documentation doesn't make money. If anything, poor documentation can increase revenue because you've ensured that clients will need your support contract.
Documentation is also a pain because in theory it's supposed to be done before development, but in reality things change so it's really only worth creating/updating after a major version release.
Ideally, the author should be the business analyst, not the developer.

Another way of looking at the documentation is for CYA purposes. If you ever have the misfortune of being in a project, where the leadership does not generate documentation, then blame for the bad code can go to you. Unless you protected your self with documentation.

The person who speaks the language of your client best is the person who should write the documentation, even if they aren't the person who best understands the product. They should confer with the person who best understands the product, but documentation is not about coding ability; it's about communication.
If you are bad at communicating, it doesn't matter how well you know your product, your documentation will be useless.

Is it better to do roll-your-own or ready-built forum software?

As part of a wide ranging job for a cystic fibrosis support organization, they'd also like a web site set up and I've decided on Apache running on Linux (due to its security and low cost mostly). Other than (fairly) static content, they also want a forum where people can discuss issues with the condition - it'll be attached to a hospital chain so there'll be plenty of medical staff there who know little about the web.
I can handle all the specific coding and Apache setup since I've done it before but I'm interested in people's opinions as to whether I should roll my own forum software or get a hold of some ready-built stuff. I've not had any experience with forum software but I could generate my own (initially buggy, I'm sure) in a month or so.
It'll require registration and login to leave comments (but guest access just to read) and I'd like it to be 'pretty' (excuse me while I remember damning customers for providing similarly vague requirements specs :-) but not necessarily infinitely-configurable with skins/themes/etc.
If anyone has some compelling reasons (and experience with specific products that can provide what I need), I'd be interested in hearing about them. Alternatively, does anyone have any 'gotchas' they experienced while coding their own forum software?

Advantages to rolling your own:
a non-standard custom-built system means you'll be less prone to "standard" attacks (e.g.: a vulnerability in PunBB) since bad guys tend to bother with exploit-hunting only on widely-deployed systems (more return on their investment)
absolute control over how your system works and looks
you'll learn a lot
Disadvantages:
you'll repeat mistakes other people have already solved
it'll take you longer to get up and running
long-term it'll be more maintenance (since you have to fix bugs & add features yourself).
you can't "leverage the community" -- if you choose an off-the-shelf forum that has a plugin system then there's a whole bunch of community add-ons that won't be available for your custom forum software.
There's a GIANT list of forum software on wikipedia -- there's most likely something in there that will suit your needs that you can get up and running quickly.

IMHO the old "don't build what you can buy" adage applies to this (well, the web 2.0 version is obviously "don't build what you can download"). Have a look around at the available forum software, pick one that covers 99% of your needs and tweak it to do the rest.
If you still want to build your own forum software that'll probably be a cool side project but if the job is to get a forum up and running, then go and download one - don't try to mix up the desire to do cool stuff and the day job unless the day job is just to do cool stuff only.

One of the best-kept secrets on the internets is a little gem called FUDforum, by Ilia Alshanetsky.
And yes, it's the same Ilia who wrote xDebug's original profiler code, improved the caching in MMcache, fixed several security bugs in libmcrypt, and who was the release manager for the PHP language from 4.3.3 to 4.3.6+. He is, as my friends in Boston would say, wicked smaart.
Because of this, FUDforum is robust, ridiculously fast and more secure than probably any other part of your web application will ever be. It comes with a neat install script and it has all the features you'll need.
Plus, it's not a high-profile target like phpBB or vBulletin, which means you won't have to worry about spambots constantly banging on the gates.

Having written my own forum software before...
It seems like a simple problem, but when you get into it, you find that there's a lot of little things that you'd like to do nicer, and it takes a lot of time. Mine was cool and all, and I did get paid for it, but if I was doing it over again (which has also happened), I'd use a customizable pre-made solution, and spend all my spare time doing something productive. :)

Forum softwares tend to have rather complex minimum requirements. A few things you are very likely to need do matter what you do:
Forum/thread/post hierarchy;
User system;
Security system (eg user/admin classes and all kinds of restrictions for users);
Gathering statistics;
BBCodes or some other minimized markup language (NEVER allow users to do full HTML);
File uploads and avatars;
Bans and other punishments;
CAPTCHAs;
etc.
Ready made forum systems provide this out-of-the-box and lots more. Setup is mostly easy too. Why do it all over again yourself?

My answer would be: don't reinvent the wheel, there are plenty of fora software out there. My preference would go for RForum if you need only that.

I'd say, don't waste your time. phpBB 3 is pretty stable, usable and feature-rich forum. We use it at work (for our internal discussions), and I really don't have anything bad to say about it.

I'd concur with most of the above posters that since you want something which appears fairly standard, why reinvent something that already exists?
Like any development, creating forum software is probably much harder than it looks! There will be problems solved in the existing software which you haven't even considered.
It's worth adding that if you do require any specific additional functionality, you can always build that on top of an existing solution anyway, which is especially easy if you have the source code (whether open source or commercial).

From the sounds of the website that you are building, there is the potential for the forum to be a highly useful and visible resource, it would be good to go with something that already exists, due to the quality of a lot of the products out there and the rich communities that surround them.
I think that vBulletin, although a paid for product, would suit your needs and give you a great base to build a community on.

vanilla is pretty bare bones and easy to configure, perhaps find a system which is easy to extend vs building everything yourself

Ready built until you have some really unique features needed that can be tied to money it will make you.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string