how can I detect tampered big file in a short time?

how can I detect tampered big file in a short time? - security

I felt very frustrated in the battle with cheaters of my game. I found a lot of hackers tampered my game data to avoid the anti-cheat system. I have tried some methods to verify if the game data has tampered or not. Such as encrypting my asset-package or check the hash of the package header.
However, I got stuck on the issue that my asset-package is huge. It is almost 1~3GB. I know the digital signature is doing very well in verifying data. But I need this to be done in almost real-time.
It seems I have to make a trade-off between verifying the whole file and the performance. Does there any way to verify a huge file in a short-time?

AES-NI based hashing such as Meow Hash can easily reach 16 bytes per cycle on a single thread, that is, for data already on-cache, it process tens of gigabytes of input in a second. Obviously in reality the memory and disk I/O speed becomes the limiting factor, but they apply on any method, so you can think of them as the upper limit. Since it's not designed for security, it's also possible for cheaters to quickly figure out a viable collision.
But, even if you figure out a sweet spot between speed and security, you're still relying on cheaters not forwarding your file/memory I/O. Additionally, it's still possible for the cheaters to just NOP any asset verification call. Since you care about cheaters, I'd assume this is an online game. The more common practice is to rearchitect the game to prevent cheating even with a broken asset. On Valorant, they move the line of sight calculation to server-side. LoL add kernel driver

Related

external multithreading sort

I need to implement external multithreading sort. I dont't have experience in multithreading programming and now I'm not sure if my algorithm is good anoth also I don't know how to complete it. My idea is:
Thread reads next block of data from input file
Sort it using standart algorith(std::sort)
Writes it to another file
After this I have to merge such files. How should I do this?
If I wait untill input file will be entirely processed until merge
I recieve a lot of temporary files
If I try to merge file straight after sort, I can not come up with
an algorithm to avoid merging files with quite different sizes, which
will lead to O(N^2) difficulty.
Also I suppose this is a very common task, however I cannot find good prepared algoritm in the enternet. I would be very grateful for such a link especially for it's c++ implementation.

Well, the answer isn't that simple, and it actually depends on many factors, amongst them the number of items you wish to process, and the relative speed of your storage system and CPUs.
But the question is why to use multithreading at all here. Data too big to be held in memory? So many items that even a qsort algorithm can't sort fast enough? Take advantage of multiple processors or cores? Don't know.
I would suggest that you first write some test routines to measure the time needed to read and write the input file and the output files, as well as the CPU time needed for sorting. Please note that I/O is generally A LOT slower than CPU execution (actually they aren't even comparable), and I/O may not be efficient if you read data in parallel (there is one disk head which has to move in and out, so reads are in effect serialized - even if it's a digital drive it's still a device, with input and output channels). That is, the additional overhead of reading/writing temporary files may more than eliminate any benefit from multithreading. So I would say, first try making an algorithm that reads the whole file in memory, sorts it and writes it, and put in some time counters to check their relative speed. If I/O is some 30% of the total time (yes, that little!), it's definitely not worth, because with all that reading/merging/writing of temporary files, this will rise a lot more, so a solution processing the whole data at once would rather be preferable.
Concluding, don't see why use multithreading here, the only reason imo would be if data are actually delivered in blocks, but then again take into account my considerations above, about relative I/O-CPU speeds and the additional overhead of reading/writing the temporary files. And a hint, your file accessing must be very efficient, eg reading/writing in larger blocks using application buffers, not one by one (saves on system calls), otherwise this may have a detrimental effect if the file(s) are stored on a machine other than yours (eg a server).
Hope you find my suggestions useful.

Is bcrypt viable for large web sites?

I've been on the bcrypt bandwagon for a while now, but I'm having trouble answering a simple nagging question.
Imagine I have a reasonably successful web site in the U.S... about 100,000 active users that each have activity patterns requiring 2 to 3 authentication attempts on average over the course of a typical American business day (12 hours when you include timezones). That's 250,000 authentication requests per day, or about 5.8 authentications per second.
One of the neat things about bcrypt is that you tune it, so that over time it scales as hardware does, to stay ahead of the crackers. A common tuning is to get it to take just over 1/10 of a second per hash creation... let's say I get it to .173 seconds per hash. I chose that number because it just so happens that .173 seconds per hash works out to about 5.8 hashes per second. In other words, my hypothetical web server is literally spending all it's time doing nothing but authenticating users. Never mind actually doing any useful work.
To address this issue, I would have to either tune bcrypt way down (not a good idea) or get a dedicated server just to do authentications, and nothing else. Now imagine that the site grows and adds another 100,000 users. Suddenly I need two servers: again, doing nothing but authentication. Don't even start thinking about load spikes, as you have light and busy periods throughout a day.
As I see it right now, this is one of those problems that would be nice to have, and bcrypt would still be worth the trouble. But I'd like to know if I'm I missing something obvious here? Something subtle? Or can anyone out there actually point to a well known web site running a whole server farm just for the authentication portion of their site?

Even if you tune bcrypt to take only, say, 1/1000 of a second, that's still quite a bit slower than simple hashing — a quick and dirty Perl benchmark says my not-so-new computer can calculate about 300,000 SHA-256 hashes per second.
Yes, the difference between 1000 and 300,000 is only about 8 bits, but that's still 8 bits of security margin you wouldn't have otherwise, and that difference is only going to increase as CPUs get faster.
Also, if you use scrypt instead of bcrypt, it will retain its memory-hardness property even if the iteration count is lowered, which will still make brute forcing it harder to parallelize.

For a peer-to-peer app that can resume file transfers, is it sufficient to check filesize/modified date for changes before resuming a file?

I'm working on a networked application that has a peer-to-peer file transfer component (think instant messenger), and I'd like to make it able to resume file transfers gracefully.
If there is an ongoing file transfer, and one user drops out, the recipient still knows how much of the file he's successfully received and therefore where to resume the transfer from. However, if the file has changed in the meantime, how can this be detected? With regards to my questions, I'm not focused here on corruption by the network so much as corruption by the source file being altered.
The way I was starting out on this was by having the sender hash the file before sending it, so the recipient has a hash to check the finished file against. However, this only detects corruption at the very end, unless each resume also hashes. This problem could be alleviated by viewing the file in chunks, and hashing each of those. However, the bigger problem with hashing is that it can take a really, really long time, which is just a bad user experience when a user just wants to immediately send something (Ex: Linux ISO on a slow network share is the file to be sent).
I was thinking about changing to simply checking the file size and modified date each time a transfer begins or is resumed. While this is clearly not foolproof, unless I'm missing something (and please correct me if I am), almost every means an end-user would be using to alter files will be well-behaved and at the very least mark the modified date, and even if not, the change in size should catch 99% of cases. Does this seem like an acceptable compromise? Bad idea?
How do the established protocols handle this?

The quick answer to your question is that it will work in most cases, unless files are modified often.
Instead of hashes, use check sums (CRC32 for example). These are much faster to check whether a file has been modified.
If a connection breaks, you only need to send the computed chunk checksums back to the source which can compute whether the current chunks have been modified in between. Then, it can decide which one to resend and send the missing chunks.
Chunk & checksums are the best trade-off over full files and hashes regarding user experience.

Sensitive Data In Memory

I'm working on a Java password manager and I currently have all of the user's data, after being decrypted from a file, sitting around in memory at all times and stored plainly as a String for displaying in the UI etc.
Is this a security risk in any way? I'm particularly concerned with someone "dumping" or reading the computer's memory in some way and finding a user's naked data.
I've considered keeping all sensitive pieces of data (the passwords) encrypted and only decrypting each piece as needed and destroying thereafter... but I'd rather not go through and change a lot of code on a superstition.

If your adversary has the ability to run arbitrary code on your target machine (with the debug privileges required to dump a process image), you are all sorts of screwed.
If your adversary has the ability to read memory at a distance accurately (ie. TEMPEST), you are all sorts of screwed.
Protect the data in transit and in storage (on the wire and on the disk), but don't worry* about data in memory.
*Ok, there are classes of programs that DO need to worry. 99.99% of all applications don't, I'm betting yours doesn't.

It is worth noting that the OS might decide to swap memory to disk, where it might remain for quite a while. Of course, reading the swap file requires strong priviledges, but who knows? The user's laptop might get stolen ...

Yes it certainly is, especially since you quite trivially can debug an application. Most code dealing with encryption and unsafe data use char arrays instead of strings. By using char arrays, you can overwrite the memory with sensitive details, limiting the lifetime of the sensitive data.

In theory, you cannot protect anything in memory completely. Some group out there managed to deep freeze the memory chips and read their contents 4 hours after the computer was turned off. Even without going to such lengths, a debugger and a breakpoint at just the right time will do the trick.
Practically though, just don't hold the plaintext in memory for longer than absolutely necessary. A determined enough attacker will get to it, but oh well.

How might one go about implementing a disk fragmenter?

I have a few ideas I would like to try out in the Disk Defragmentation Arena. I came to the conclusion that as a precursor to the implementation, it would be useful, to be able to put a disk into a state where it was fragmented. This seems to me to be a state that is more difficult to achieve than a defragmented one. I would assume that the commercial defragmenter companies probably have solved this issue.
So my question.....
How might one go about implementing a fragmenter? What makes sense in the context that it would be used, to test a defragmenter?

Maybe instead of fragmenting the actual disk, you should really test your defragmentation algorithm on a simulation/mock disk? Only once you're satisfied the algorithm itself works as specified, you could do the testing on actual disks using the actual disk API.
You could even take snapshots of actual fragmented disks (yours or of someone you know) and use this data as a mock model for testing.

How you can best fragement depends on the file system.
In general, concurrently open a large number of files. Opening a file will create a new directory entry but won't cause a block to be written for that file. But now go through each file in turn, writing one block. This typically will cause the next free block to be consumed, which will lead to all your files being fragmented with regard to each other.
Fragmenting existing files is another matter. Basically, do the same, but do it on a file copy of existing files, doing a delete of the original and rename of copy.

I may be oversimplifying here but if you artificially fragment the disk won't any tests you run will be only true for the fragmentation created by your fragmenter rather than any real world fragmentation. You may end up optimising for assumptions in the fragmenter tool that don't represent real world occurrences.
Wouldn't it be easier and more accurate to take some disk images of fragmented disks? Do you have any friends or colleagues who trust you not to do anything anti-social with their data?

Fragmentation is a mathematical problem such that you are trying to maximize the distance the head of the hard drive is traveling while performing a specific operation. So in order to effectively fragment something you need to define the specific operation first

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string