Why some applications base 64 encode HMACs even if it's not for mail purposes?
Is it a vestige from older implementations or practices?
If you do a little research in google with the words 'HMAC', 'base' and '64' you will see many people do it, and i don't know why. I'm pretty sure that it has nothing to do neither with storing nor outputting it.
If you want to send binary data encoded into SOAP envelope or as JSON data, you would risk that the part of the raw data is interpreted as the structural elements of these formats (e.g. '<', '[', '{').
Therefore you need to encode it as Base64 to preserve its contents on the wire.
Related
I'm trying to watch spotify packages on the emulator, but the data sent and received are corrupted. How can I solve this problem?
İmages:
I tried reading the data many times but it always looks like this. I want to see the data properly in JSON form.
This data is not corrupted, it's just not in the format you want.
You can see the format by looking at the content-type header, which says application/protobuf.
This is not JSON data. Instead it's Protobuf, which is a general-purpose serialization format, similar to JSON or XML, but designed to be faster to process and smaller to transfer, in part by being sent as raw binary data, instead of readable strings.
To deserialize this 100% correctly, you will need the Protobuf schema for the API you're talking to (a .proto file). In many cases, unless this is your own or a publicly documented API, that's not going to be available.
You can still try to decode the data into raw data types though, although that might not allow you to decode all information immediately. There's more info on that here: raw decoder for protobufs format. Decoding data like this works best using the protoc command-line tool, but you may also be able to decode this data using https://protobuf-decoder.netlify.app/. Note that that takes hex data, not a raw string like you have here, so you'll have to pick the 'Hex' option for the body in HTTP Toolkit to copy the hex codes over instead.
I can only extract data from here, how can I use them with python requests? I want to convert them to dictionary data type. Or can we solve this using https://github.com/spotify/proto-requests? I'm trying to write a program on Spotify.
The TipTap editor and its progeny quasar-tiptap can export user created content in the browser in both HTML and JSON formats. If I plan on allowing round trips of the user data to my server what are the pros and cons of using either format for storage.
I would assume HTML has a greater likelihood of XSS attacks, and indeed such a vulnerability has been found (and rectified) in the past. And using JSON would be easier for backend parsing should it ever be required.
Beyond this, are there any major benefits of using either format? Preserving fidelity of user input is important. Size is important (any differences in image storage?). Editor performance is important. Scripting attack vulnerability is extremely important.
Which to choose?
I am sure at 12 months later you have made your choice. The usual caveats that this question is not really answerable and can only really elicit opinions. However, the best I can offer (and I am definately no expert):
User entered text is a risk (cross-site scripting, SQL injection attack, potentially creating a false alert dialog with disinformation for the user reading another user's content) regardless of how the rich text data is sent to the server and/or stored. This is because eventually the content will be a) very likely stored in a database b) sent to and interpreted by a browser in HTML format. Whether it goes through an intermediary JSON format is largely irrelevant. Stripping out all unwanted tags, attributes and SQL commands is going to be an extremely important server responsibility (known as sanitising), whatever the format. Many more well tested sanitising libraries exist for HTML in many more languages/server side technologies than for the relatively niche ProseMirror JSON format used by TipTap.
The TipTap documentation talks about the advantages of the 2 formats here
The ProseMirror JSON format is reasonably verbose, and inspecting both the JSON and HTML side by side, the JSON format seems to require more characters. Obviously it will depend on the rich text itself - how many formatting marks etc, but I would not worry about this in terms of server storage. Given you are likely using Vue or React with TipTap, it is likely you can set up an editor and have 2 seperate divs displaying the output side by side. You can see the 2 formats for the same text by scrolling to the bottom of this page
Editor performance - if you store the data as HTML you can pass it straight to the client. If you store it as JSON you will have to parse it and create the HTML. If you do this client side, the client will have to download and execute a library to perform this task - on NPM this is currently 14kB, so not a big issue.
Both formats will almost certainly send image data to the server as Base64 strings, so no bandwidth will be saved with either format for image files. Base64 is less efficient for DB storage as compared to saving as binary objects, particularly if compression is used, so you could strip these out at the server, but this will have a cost in time spent developing and testing the backend which may well be better spent on other things
I work with a tool that contains everything within XML inside the database.
Some reports that are stored in the database use a third party tool to load, and store the main data to configure the 'report' definition in what is not a human-readable format.
I'd post it here, but it's some 130,000 bytes.
I have attempted to decode it using popular methods that I assumed it would have been encoded in, such as base64, base 32, etc, but none have been able to decode the string.
Is there a way to identify what encoding a given string has, using a tool available online?
I don't have the benefit of access to the developer that built this functionality, the source code generating this string, or any documentation on it.
To give some context around what I'm trying to do - I need to reverse-engineer how a specific definition in a system is generated, so that it can be modified slightly (manually) in a text editor to support an operation that would otherwise require manually re-creating the report.
I apologize is if this may be the wrong exchange site for this question - I realize it's not specific to a 'programming' issue and I haven't tried to solve it using a programming language. If so - please redirect me to the appropriate place and I'll be happy to ask there instead.
Update: The text consists of strictly A-Z, 0-9 characters.
You can check amongst known encoding formats with this tool only if you are sure data is not encrypted
What is the best way of prevention from stored XSS ?
should every text field (even plain text) be sanitized server-side to prevent from XSS HTML using something like OWASP Java HTML Sanitizer Project?
or should the client protect itself from XSS bugs by applying XSS prevention rules?
The problem with the first solution is that data may be modified (character encoding, partial or total deletion...). Which can alter the behavior of the application, especially for display concerns.
You apply sanitisation if and only if your data needs to conform to a specific format/standard and you are sure you can safely discard data; e.g. you strip all non-numeric characters from a telephone or credit card number. You always apply escaping for the appropriate context, e.g. HTML-encode user-supplied data when putting it into HTML.
Most of the time you don't want to sanitise, because you want to explicitly allow freeform data input and disallowing certain characters simply makes little sense. One of the few exceptions I see here would be if you're accepting HTML input from your users, you will want to sanitise that HTML to filter out unwanted tags and attributes and ensure the syntax is valid; however, you'd probably want to store the raw, unsanitised version in the database and apply this sanitisation only on output.
The gold standard in security is: Validate your inputs, and encode, not sanitize, your outputs.
First, validate the input server side. A good example of this would be a phone number field on a user profile. Phone numbers should only consist of digits, dashes, and perhaps a +. So why allow users to submit letters, special characters, etc? It only increases attack surface. So validate that field as strictly as you can, and reject bad inputs.
Second, encode the output according to its output context. I'd recommend doing this step server side as well, but it's relatively safe to do client side as long as you are using a good, well tested front-end framework. The main problem with sanitization is that different contexts have different requirements for safety. To prevent XSS when you are injecting user data into an HTML attribute, directly into the page, or into a script tag, you need to do different things. So, you create specific output encoders based on the output context. Entity encode for the HTML context, JSON.stringify for the script context, etc.
The client has to defend against this. For two reasons:
Because this is where the vulnerability happens. You might be calling a 3rd party API and they haven't escaped / encoded everything. It is better not trust anything.
Second, the API could be written for HTML page and for Android App. So why should the server html encode what some may consider html tags in a request when on the way back out it may be going to android app?
I am wondering about the format UUIDs are by default represented in CouchDB. While the RFC 4122 describes UUIDs like 550e8400-e29b-11d4-a716-446655440000, CouchDB uses continuously chars like 3069197232055d39bc5bc39348a36417. I've searched some time in both their wiki and their documentation what this actually is, however without any result.
Do you know whether this is either a non RFC-conform format omitting all - or is this a completely different representation of the 128 bits.
The background is that I'm using Java UUIDs which are formatted as noted in the RFC. I see the advantage that the CouchDB-style is probably more handy for building internal trees, but I want to be sure to use a consistent implementation.
Technically we don't use the rfc standard for uuids as you've noticed. Version four uuids reserve something like four bits to specify the version of uuid. We also don't format them with the hyphens that are generally seen in other implementations.
CouchDB uuids are 16 random bytes formatted as hex. Roughly speaking that's a v4 uuid but not rfc compliant.
Regardless of the specifics, there's really not much of an issue in practice. You generally shouldn't try to interpret a uuid unless you're trying to do some sort out-of-band analysis. CouchDB will never interpret uuids, we only rely on the properties of randomness involved therein.
Bottom line would be to not worry about it and just treat them as strings after generation.
K I can provide some 2019 reference from the doc site: "it's in any case preferable to provide one's own uuids" -- https://docs.couchdb.org/en/latest/best-practices/documents.html?highlight=uuid
I ran slap bang into this because the (hobby) db I'm attempting as a first programming anything, deals with an application that does generate and use 4122 -compliant uuids and I was chewing my nails worrying about stripping the "-" bits out and putting them back on retrieval.
Before it hit me that the uuid that couchdb uses as the doc _id is a string not a number... doh. So I use the app's uuid generated when it creates an object to _id the document. No random duplicated uuids.