How would I write a malware detection software?

How would I write a malware detection software? - security

What are the resources I need to go through to fully understand how this works? I've looked up online but all I got are software solutions rather than how the software actually detects them.
I want to be able to detect a malware that's in my computer. Let's say there's a trojan horse in my Computer. How would I write a program to detect it?
I'm a beginner at Information Security.
Thanks in advance.

Well most endpoint security products have:
- an on-demand scanning component.
- a real-time scanning component.
- hooks into other areas of the OS to inspect data before "released". E.g. Network layer for network born threats.
- a detection engine - includes file extractors
- detection data that can be updated.
- Run time scanning elements.
There are many layers and components that all need to work together to increase protection.
Here are a few scenarios that would need to be covered by a security product.
Malicious file on disk - static scanning - on-demand. Maybe the example you have.
A command line/on-demand scanner would enumerate each directory and file based on what was requested to be scanned. The process would read files and pass streams of data to a detection engine. Depending on the scanning settings configured and exclusions, files would be checked. The engine can understand/unpack the files to check types. Most have a file type detection component. It could just look at the first few bytes of a file as per this - https://linux.die.net/man/5/magic. Quite basic but it gives you an idea on how you can classify a file type before carrying out more classifications. It could be as simple as a few check-sums of a file at different offsets.
In the case of your example of a trojan file. Assuming you are your own virus lab, maybe you have seen the file before, analyzed it an written detection for it as you know it is malicious. Maybe you can just checksum part of the file and publish this data to you product. So you have virusdata.dat, in it you might have a checksum and a name for it. E.g.
123456789, Troj-1
You then have a scanning process, that loads your virus data file at startup and opens the file for scanning. You scanner checksums the file as per the lab scenario and you get a match with the data file. You display the name as it was labelled. This is the most basic example really and not that practical bu hopefully it serves some purpose. Of course you will see the problem with false positives.
Other aspects of a product include:
A process writing a malicious file to disk - real-time.
In order to "see" file access in real-time and get in that stack you would want a file system filter driver. A file system mini filter for example on Windows: https://msdn.microsoft.com/en-us/windows/hardware/drivers/ifs/file-system-minifilter-drivers. This will guarantee that you get access to the file before it's read/written. You can then scan the file before it's written or read by the process to give you a chance to deny access and alert. Note in this scenario you are blocking until you come to a decision whether to allow or block the access. It is for this reason that on-access security products can slow down file I/O. They typically have a number of scanning threads that a driver can pass work to. If all threads are busy scanning then you have a bit of an issue. You need to handle things like zip bombs, etc and bail out before tying up a scanning engine/CPU/Memory etc...
A browser downloading a malicious file.
You could reply on the on-access scanner preventing a file hitting the disk by the browser process but then the browsers can render scripts before hitting the file system. As a result you might want to create a component to intercept traffic before the web browser. There are a few possibilities here. Do you target specific browsers with plugins or do you go down a level and intercept the traffic with a local proxy process. Options include hooking the network layer with a Layered Service Provider (LSP) or WFP (https://msdn.microsoft.com/en-us/windows/hardware/drivers/network/windows-filtering-platform-callout-drivers2). Here you can redirect traffic to an in or out of process proxy to examine the traffic. SSL traffic poses an issue here unless you're going to crack open the pipe again more work.
Then there is run-time protection, where you don't detect a file with a signature but you apply rules to check behavior. For example a process that creates a start-up registry location for itself might be treated as suspicious. Maybe not enough to block the file on it's own but what if the file didn't have a valid signature, the location was the user's temp location. It's created by AutoIt and doesn't have a file version. All of these properties can give weight to the decision of if it should be run and these form the proprietary data of the security vendor and are constantly being refined. Maybe you start detecting applications as suspicious and give the user a warning so they can authorize them.
This is a massively huge topic that touches so many layers. Hopefully this is the sort of thing you had in mind.

Among the literature, "The Art of Computer Virus Research and Defense" from Peter Szor is definitely a "must read".

Related

Testing security of untrusted image upload

I need to test how my website will deal with image files with malware embedded. I'm already satisfied with its validation of the file type using header inspection. I'm looking to test it with genuine image files with embedded malware.
Can I get a jpg that should be recognised as malware to test this? Or make one myself?
Or if there is a better way of doing it, how can I make sure that everything is wired up and working correctly?
It will be deployed on Azure with the images saved to Azure Blob storage. I want to use the [new] Azure storage security service to detect malware.

It is typically difficult to use real malware for testing. You need to be able to handle the file in your test environment with care to not infect your systems, or, escape out into the rest of your network.
To facilitate testing anti-malware solutions vendors provide a number of test files. These are benign files which are detected as malicious. Therefore it is safe to use them in your environment but will serve as a good test case of your solution.
There is one industry standard file (EICAR) which is available for testing anti-malware solutions. As you can see from VirusTotal almost every vendor will detect this file as malicious. Many of them identifying it as a test file.
In your particular use case there is one challenge here, it is not an image file. However, as you mention in the comments I believe that turning off the image detection during this test is much better than using real malware for testing.
Finally, one word of caution with EICAR. As so many products detect this as a virus you need to be careful of the anti-malware solution that is running on the systems that the file ends up on. For example, if your development environment contains an on access scanner attempting to run the test will potentially trigger the on access scanner.

Security issues in accepting image uploads

What are the major security issues to consider when accepting image uploads, beyond the normal stuff for all HTTP uploads?
I'm accepting image uploads, and then showing those images to other users.
How should I verify, for example, that the uploaded image is actually a valid image file? Are there any known vulnerabilities in viewers that are exploitable by malformed image files for which I should be concerned about accidentally passing along exploits? (Quickly googling seems to show that there once was in IE5/6.)
Should I strip all image metadata to help users prevent unintentional information disclosures? Or are there some things that are safe and necessary or useful to allow?
Are there any arcane features of common image formats that could be security vulnerabilities?
Are there any libraries that deal with these issues? (And/or with other issues like converting progressive JPEGs to normal JPEGs, downsampling to standardize sizes, optimizing PNGs, etc.)

Some things I learned recently from a web security video:
The nuclear option is to serve all uploaded content from a separate domain which only serves static content - all features are disabled and nothing important is stored there.
Considering processing images through imagemagick etc. to strip out funny business.
For an example of what you are up against, look up GIFAR, a technique that puts a GIF and Java JAR in the same file.

The risk of propogation of bugs inside image formatters isn't "exactly" your problem, but you can help anyway, by following the general practice of mapping ".jpg" to your executable language, and processing each image manually (in this way you can do refer checks as well).
You need to be careful of:
People uploading code as images (.jpg with actual c# code inside)
any invalid extensions (you check for this)
People trying to do path-related attacks on you
The last one is what you'll need to be wary of, if you're dynamically reading in images (as you will be, if you follow my first bit of advice).
So ensure you only open code in the relevant folder, and, probably more importantly, lock down the user that does this work. I mean the webserver user. Make sure it only has permissions to read from the folder you are working in, and other such logical things.
Stripping metadata? Sure why not, it's quite polite of you, but I wouldn't be nuts about it.

Your biggest risk is that an attacker tries to upload some type of executable code to your server. If the uploaded file is then browsable on the web, the attacker may be able to cause the code to run on your server. Your best protection is to first save the uploaded file to a non-publicly browsable location, try to load it as an image in your programming language and allow it if it can be successfully parsed as an image. A lot of the time people will want to resize the image anyway so really doing this is no extra work. Once the image is validated, you can move it into the publicly browsable area for your web server.
Also, make sure you have a limit on file upload size. Most platforms will have some kind of limit in place by default. You don't want a malicious user filling up your disk with an endless file upload.

One of the vulnerabilities I know of is a "WMF backdoor". WMF is "Windows Metafile"--a graphical format rendered by Windows GDI library. Here's wikipedia article.
The attacker is capable to execute arbitrary code on user's machine. This can happen when user just views the file via the browser, including, but not limited to Internet Explorer. The issue is said to be fixed in 2006.

What security issues appear when users can upload their own files?

I was wondering what security issues appear when the end user of a website can upload files to the server.
For instance if my website allows the users to upload a profile picture, and one user uploads something harmful instead, what could happen? What kind of security should I set up to prevent attacks like this? I'm talking here about images, but what about the case where a user can upload anything into a file-vault kind of application?
It's more a general question than a question about a specific situation, so what are the best practices in that situation? What do you usually do?
I suppose: type validation on upload, different permissions for uploaded files... what else?
EDIT: To clear up the context, I am thinking about a web application where a user can upload any kind of file and then display it in the browser. The file would be stored on the server. The users are whoever uses the website, so there is no trust involved.
I am looking for general answers that could apply for different languages/framework and production environments.

Your first line of defense will be to limit the size of uploaded files, and kill any transfer that is larger than that amount.
File extension validation is probably a good second line of defense. Type validation can be done later... as long as you aren't relying on the (user-supplied) mime-type for said validation.
Why file extension validation? Because that's what most web servers use to identify which files are executable. If your executables aren't locked down to a specific directory (and most likely, they aren't), files with certain extensions will execute anywhere under the site's document root.
File extension checking is best done with a whitelist of the file types you want to accept.
Once you validate the file extension, you can then check to verify that said file is the type its extension claims, either by checking for magic bytes or using the unix file command.
I'm sure there are other concerns that I missed, but hopefully this helps.

Assuming you're dealing with only images, one thing you can do is use an image library to generate thumbnails/consistent image sizes, and throw the original away when you're done. Then you effectively have a single point of vulnerability: your image library. Assuming you keep it up-to-date, you should be fine.
Users won't be able to upload zip files or really any non-image file, because the image library will barf if it tries to resize non-image data, and you can just catch the exception. You'll probably want to do a preliminary check on the filename extension though. No point sending a file through the image library if the filename is "foo.zip".
As for permissions, well... don't set the execute bit. But realistically, permissions won't help protect you much against malicious user input.
If your programming environment allows it, you're going to want to run some of these checks while the upload is in progress. A malicious HTTP client can potentially send a file with an infinite size. IE, it just never stops transmitting random bytes, resulting in a denial of service attack. Or maybe they just upload a gig of video as their profile picture. Most image file formats have a header at the beginning as well. If a client begins to send a file that doesn't match any known image header, you can abort the transfer. But that's starting to move into the realm of overkill. Unless you're Facebook, that kind of thing is probably unnecessary.
Edit
If you allow users to upload scripts and executables, you should make sure that anything uploaded via that form is never served back as anything other than application/octet-stream. Don't try to mix the Content-Type when you're dealing with potentially dangerous uploads. If you're going to tell users they have to worry about their own security (that's effectively what you do when you accept scripts or executables), then everything should be served as application/octet-stream so that the browser doesn't attempt to render it. You should also probably set the Content-Disposition header. It's probably also wise to involve a virus scanner in the pipeline if you want to deal with executables. ClamAV is scriptable and open source, for example.

size validation would be useful too, wouldn't want someone to intentionally upload a 100gb fake image just out of spite now would you :)
Also, you may want to consider something to prevent people from using your bandwidth just for a easy way to host images (I would mostly be concerned with hosting of illegal stuff). Most people would use imageshack for temp image hosting anyway.

For further reading, there's a great article by Acunetix on Why File Upload Forms are a Major Security Threat

With more context, it would be easier to know where the vulberabilities may lie.
If the data could be stored in a database (sounds like it won't be), then you should guard against SQL Injection attacks.
If the data could be displayed in a browser (sounds like it would be), then you may need to guard against HTML/CSS Injection attacks.
If you're using scripting languages (e.g., PHP) on the server, then you may need to guard against injection attacks against those specific languages. With compiled server code (or a poor scripting implementation), there's the chance of buffer overrun attacks.
Don't overlook user data security, too: Can your users trust you to prevent their data from being compromised?
EDIT: If you really want to cover all bases, consider the risks of JPEG and WMF security holes. These could be exploited if a malicious user can upload the files from one system, and then views the files -- or persuades another user to view the files -- from another system.

Size of the content
Restricting certain file types (.jpeg, .png etc., white-listed file types should only be allowed)
file tampering (for ex: a site supporting foreign languages, certain encoding is allowed. the hacker may take advantage of this and adds any script/malicious code encoded and appends to the original file and tries to upload)

Best way to determine whether an attached document will harm a user when opened

We're writing a feature that will allow our users to "attach" things like Word documents, Excel spreadsheets, pictures, pdfs to documents in our application - just like email.
We don't however, want to allow them to attach .exe, .bat, .reg files, or anything else that might harm them if they opened it - so we're proposing to have a whitelist of allowed file types.
Does anyone know of a better way to determine whether a document is safe? (i.e. does not have the ability to harm a user's computer).
Or instead a resource that would give us a list of commonly used safe documents to add to our whitelist as defaults?

You could use a whitelist plus the result of AssocIsDangerous (http://msdn.microsoft.com/en-us/library/bb773465(VS.85).aspx) to determine if the file should be allowed. White list for files to attach without warning, AssocIsDangerous to block altogether, and the remaining could get a default warning dialog.
Be careful about the white list because complex documents can contain macros and their associated applications could contain security vulnerability in their parsers.

What about Word macro viruses? There is no one "safe" document type. What if someone renames a .exe file .doc - is that allowable? Don't depend on the file type or name alone and never just trust client input. Validate it on the server side if at all possible, most likely using an anti-virus program or some other known utility.

Use a reverse proxy setup such as
www <-> HAVP <-> webserver
HAVP (http://www.server-side.de/) is a way to scan http traffic though ClamAV or any other commercial antivirus software. It will prevent users to download infected files. If you need https or anything else, then you can put another reverse proxy or web server in reverse proxy mode that can handle the SSL before HAVP
Nevertheless, it does not work at upload, so it will not prevent the files to be stored on servers, but prevent the files from being downloaded and thus propagated. So use it with a regular file scanning (eg clamscan).

I honestly think you are best served by either Clam AV on Linux or Trend Micro on Windows.

ensuring uploaded files are safe

My boss has come to me and asked how to enure a file uploaded through web page is safe. He wants people to be able to upload pdfs and tiff images (and the like) and his real concern is someone embedding a virus in a pdf that is then viewed/altered (and the virus executed). I just read something on a procedure that could be used to destroy stenographic information emebedded in images by altering least sifnificant bits. Could a similar process be used to enusre that a virus isn't implanted? Does anyone know of any programs that can scrub files?
Update:
So the team argued about this a little bit, and one developer found a post about letting the file download to the file system and having the antivirus software that protects the network check the files there. The poster essentially said that it was too difficult to use the API or the command line for a couple of products. This seems a little kludgy to me, because we are planning on storing the files in the db, but I haven't had to scan files for viruses before. Does anyone have any thoughts or expierence with this?
http://www.softwarebyrob.com/2008/05/15/virus-scanning-from-code/

I'd recommend running your uploaded files through antivirus software such as ClamAV. I don't know about scrubbing files to remove viruses, but this will at least allow you to detect and delete infected files before you view them.

Viruses embedded in image files are unlikely to be a major problem for your application. What will be a problem is JAR files. Image files with JAR trailers can be loaded from any page on the Internet as a Java applet, with same-origin bindings (cookies) pointing into your application and your server.
The best way to handle image uploads is to crop, scale, and transform them into a different image format. Images should have different sizes, hashes, and checksums before and after transformation. For instance, Gravatar, which provides the "buddy icons" for Stack Overflow, forces you to crop your image, and then translates it to a PNG.
Is it possible to construct a malicious PDF or DOC file that will exploit vulnerabilities in Word or Acrobat? Probably. But ClamAV is not going to do a very good job at stopping those attacks; those aren't "viruses", but rather vulnerabilities in viewer software.

It depends on your company's budget but there are hardware devices and software applications that can sit between your web server and the outside world to perform these functions. Some of these are hardware firewalls with anti-virus software built in. Sometimes they are called application gateways or application proxies.
Here are links to an open source gateway that uses Clam-AV:
http://en.wikipedia.org/wiki/Gateway_Anti-Virus
http://gatewayav.sourceforge.net/faq.html

You'd probably need to chain an actual virus scanner to the upload process (the same way many virus scanners ensure that a file you download in your browser is safe).
In order to do this yourself, you'd have to keep it up to date, which means keeping libraries of virus definitions around, which is likely beyond the scope of your application (and may not even be feasible depending on the size of your organization).

Yes, ClamAV should scan the file regardless of the extension.

Use a reverse proxy setup such as
www <-> HAVP <-> webserver
HAVP (http://www.server-side.de/) is a way to scan http traffic though ClamAV or any other commercial antivirus software. It will prevent users to download infected files.
If you need https or anything else, then you can put another reverse proxy or web server in reverse proxy mode that can handle the SSL before HAVP
Nevertheless, it does not work at upload, so it will not prevent the files to be stored on servers, but prevent the files from being downloaded and thus propagated. So use it with a regular file scanning (eg clamscan).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string