context window length in Codex - openai-api

Is the completion window length included in the context window length for Codex?
For da-vinci, the context window length is set to 4000 tokens.
From what I understand, as an example, if the prompt length is 3500 tokens, then the remaining 500 is for the completion. And there is no way use the whole 4000 token as the prompt.
I am pretty sure in my understanding, but it would be helpful to have it confirmed by someone knowledgeable.

The context length for da-vinci is 4096 tokens. The prompt tokens and max_tokens for the response cannot be greater than the context length.
This is from the OpenAI API docs:
The token count of your prompt plus max_tokens cannot exceed the model's context length. Most models have a context length of 2048 tokens (except for the newest models, which support 4096).
Ref: https://platform.openai.com/docs/api-reference/completions/create

Related

Openai API continue the output of the above content

How do you solve the problem of continuous output of the Openai API, such as letting the gpt api write an article. If the content is interrupted, you can continue to ask questions, so as to continue the output of the above content. This is very easy to do in ChatGPT, but after the Openai API adds the above to prompt, it will always report an error because the tokens exceed. If you don't add the above content, you can't continue the above content?
adds the above to prompt, it will always report an error because the tokens exceed.
In terms of the error related to the tokens being exceeded. I also ran into this problem.
This problem has been answered in this question: OpenAI API error: "This model's maximum context length is 4097 tokens"
It has to do with the fact that your prompt tokens plus the max_tokens must be less than 4097 tokens in total.
For example if your prompt is 200 tokens and your max_tokens is set to 4000 this will result in a context length of 4200 and the API will return an error.
If your prompt is 200 tokens your max_tokens can be set to a maximum of 3,896.
What you likely want to do here build a system that can take the input, and make a summary of them. You can then keep track of the length of the inputs and ensure that the state of your "conversation memory" does not exceed the max number of tokens you want. Then, each time you make a request, you pass in the summarized/truncated version of your conversation history along with the prompt itself.

OpenAI API error: "This model's maximum context length is 4097 tokens"

I am making a request to the completions endpoint. My prompt is 1360 tokens, as verified by the Playground and the Tokenizer. I won't show the prompt as it's a little too long for this question.
Here is my request to openai in Nodejs using the openai npm package.
const response = await openai.createCompletion({
model: 'text-davinci-003',
prompt,
max_tokens: 4000,
temperature: 0.2
})
When testing in the playground my total tokens after response are 1374.
When submitting my prompt via the completions API I am getting the following error:
error: {
message: "This model's maximum context length is 4097 tokens, however you requested 5360 tokens (1360 in your prompt; 4000 for the completion). Please reduce your prompt; or completion length.",
type: 'invalid_request_error',
param: null,
code: null
}
If you have been able to solve this one, I'd love to hear how you did it.
The max_tokens parameter is shared between the prompt and the completion. Tokens from the prompt and the completion all together should not exceed the token limit of a particular GPT-3 model.
As stated on official OpenAI website:
Depending on the model used, requests can use up to 4097 tokens shared
between prompt and completion. If your prompt is 4000 tokens, your
completion can be 97 tokens at most.
The limit is currently a technical limitation, but there are often
creative ways to solve problems within the limit, e.g. condensing your
prompt, breaking the text into smaller pieces, etc.
GPT-3 models:
This was solved by Reddit user 'bortlip'.
The max_tokens parameter defines the response tokens.
From OpenAI:
https://platform.openai.com/docs/api-reference/completions/create#completions/create-max_tokens
The token count of your prompt plus max_tokens cannot exceed the model's context length.
Therefore to solve the issue I subtract the token count of the prompt from the max_tokens and it works just fine.

What number of bytes should I use to have a safe token?

I am implementing a magic link/passwordless authentication.
I am sending an email with a token generated via crypto.randomBytes, when the user clicks on the link, it is redirected to the app and the token is validated to make sure it is unique.
Does the number of bytes matter, and if yes what would be a good number?
token is validated to make sure it is unique
maybe you could as well validate that it's not yet expired (define some validity to the token)
Does the number of bytes matter, and if yes what would be a good number?
In security, size does matter. It is considered as unfeasible to guess if the random output is 128 bit long (=16 bytes), or 256 bit (=32 bytes) with safe margin.
As well you may add some integrity/authentication check, such as signature or hmac, if you use simple random number generator (not from any serious crypto library) or counter

How many characters should a session key be for security?

I am generating a session key to be stored in a cookie using the following function:
function getRandomKey($length=32) {
$string = '';
$characters = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
for ($i = 0; $i < $length; $i++) {
$string .= $characters[mt_rand(0, strlen($characters)-1)];
}
return $string;
}
If I were to generate a 1 digit key it would have:
26 lowercase + 26 uppercase + 0-10 = 62 options.
Therefore an 8 digit key would have 62^8 or 218,340,105,584,896 possible combinations.
1) Is there any rule of thumb on how many characters out I should go? The more the better, I know, but is 8 enough or should it be more like 32 characters, 64 etc.?
2) Are there any security concerns when using localStorage?
Thanks in advance!
These are two very different questions.
1) TL;DR: about 16 characters (case-sensitive) is ok for most purposes.
First, please if you can, avoid implementing session management. It is already done in many frameworks, including session id generation and more - use an existing, well-known implementation if you can, because it is not straightforward to get it right.
Now, it's all about entropy. You started out right by calculating the number of possible combinations. If you take log2 of that, you get how many bits of entropy that session id has. (Well, let's not go into entropy here...)
So one case-sensitive alphanumeric character ([a-zA-Z0-9]) has log2(62)=5.9542 bits of entropy, two characters two times more, and so on.
The time required for an attacker to guess a valid session id is:
(2^b + 1) / (2 * n * s)
Where 'b' is the available bits of entropy in the session id, 'n' is the number of guesses the attacker can make every second, and 's' is the number of valid session ids in the system.
In a large, distributed web application, potentially using a botnet, an attacker may be able to make n=100000 guesses a second, and there may be s=1 million valid session ids. You want the result to be several hundred years at the very least, say 300 (15768000000 seconds). (These are totally arbitrary values.)
This gives about b=70, so you need 70 bits of entropy. If each character has 5.9542 bits of entropy as discussed above, it gives about 12 for the required session id length, but you can just round it up to 16 to make sure. :)
As a rule of thumb, it is sometimes assumed that bits of entropy in a session id is half the length (in bits) of that session id. It is mostly a reasonable approximation without any calculation. :) Even more so, because sessuion ids are sometimes actual random numbers base64 or otherwise encoded. Different encodings usually give different results though.
Also make sure to use a cryptographic random number generator, otherwise entropy is much less. Note that mt_rand() is not cryptographically random, so the code in your question is vulnerable!
2) TL;DR Yes. (I suppose you mean using local storage for storing the session id.)
The best possible place to store a session id is a httpOnly, Secure cookie without an expiration (non-persistent), because Javascript cannot access it there (for example cross-site scripting doesn't affect a victim user's session id at least), and being non-persistent, it will be removed when the user closes the browser and will not be persisted to disk (well, mostly... but that's a long story).
If you use localStorage, any XSS will directly affect the session id, which is very valuable for an attacker. Also sessions will survive closing the browser, which is slightly unexpected - user sessuions might easily be hijacked on shared computers.
Note though that this depends on the use-case and the risk you want to take. While it would definitaly not be ok for a financial application where you can access and manage very sensitive data, it can be ok for less risky applications. You can also let the user decide ("remember me", in which case you put it into localStorage), but most users are not aware of the associated risk, so they can't make an informed decision.
Also note that sessionStorage is a little better, because the session id will be removed from the browser when it is closed, but it is still available to Javascript (XSS).

Why should checking a wrong password take longer than checking the right one?

This question has always troubled me.
On Linux, when asked for a password, if your input is the correct one, it checks right away, with almost no delay. But, on the other hand, if you type the wrong password, it takes longer to check. Why is that?
I observed this in all Linux distributions I've ever tried.
It's actually to prevent brute force attacks from trying millions of passwords per second. The idea is to limit how fast passwords can be checked and there are a number of rules that should be followed.
A successful user/password pair should succeed immediately.
There should be no discernible difference in reasons for failure that can be detected.
That last one is particularly important. It means no helpful messages like:
Your user name is correct but your password is wrong, please try again
or:
Sorry, password wasn't long enough
Not even a time difference in response between the "invalid user and password" and "valid user but invalid password" failure reasons.
Every failure should deliver exactly the same information, textual and otherwise.
Some systems take it even further, increasing the delay with each failure, or only allowing three failures then having a massive delay before allowing a retry.
This makes it take longer to guess passwords.
I am not sure, but it is quite common to integrate a delay after entering a wrong password to make attacks harder. This makes a attack practicaly infeasible, because it will take you a long time to check only a few passwords.
Even trying a few passwords - birthdates, the name of the cat, and things like that - is turned into no fun.
Basically to mitigate against brute force and dictionary attacks.
From The Linux-PAM Application Developer's Guide:
Planning for delays
extern int pam_fail_delay(pam_handle_t *pamh, unsigned int micro_sec);
This function is offered by Linux-PAM
to facilitate time delays following a
failed call to pam_authenticate() and
before control is returned to the
application. When using this function
the application programmer should
check if it is available with,
#ifdef PAM_FAIL_DELAY
....
#endif /* PAM_FAIL_DELAY */
Generally, an application requests
that a user is authenticated by
Linux-PAM through a call to
pam_authenticate() or pam_chauthtok().
These functions call each of the
stacked authentication modules listed
in the relevant Linux-PAM
configuration file. As directed by
this file, one of more of the modules
may fail causing the pam_...() call to
return an error. It is desirable for
there to also be a pause before the
application continues. The principal
reason for such a delay is security: a
delay acts to discourage brute force
dictionary attacks primarily, but also
helps hinder timed (covert channel)
attacks.
It's a very simple, virtually effortless way to greatly increase security. Consider:
System A has no delay. An attacker has a program that creates username/password combinations. At a rate of thousands of attempts per minute, it takes only a few hours to try every combination and record all successful logins.
System B generates a 5-second delay after each incorrect guess. The attacker's efficiency has been reduced to 12 attempts per minute, effectively crippling the brute-force attack. Instead of hours, it can take months to find a valid login. If hackers were that patient, they'd go legit. :-)
Failed authentification delays are there to reduce the rate of login attempt. The idea that if somebody is trying a dictionary or a brute force attack against one or may user accounts that attacker will be required to wait the fail delay and thus forcing him to take more time and giving you more chance to detect it.
You might also be interested in knowing that, depending on what you are using as a login shell there is usually a way to configure this delay.
In GDM, the delay is set in the gdm.conf file (usually in /etc/gdm/gdm.conf). you need to set RetryDelay=x where x is a value in seconds.
Most linux distribution these day also support having FAIL_DELAY defined in /etc/login.defs allowing you to set a wait time after a failed login attempt.
Finally, PAM also allows you to set a nodelay attribute on your auth line to bypass the fail delay. (Here's an article on PAM and linux)
I don't see that it can be as simple as the responses suggest.
If response to a correct password is (some value of) immediate, don't you only have to wait until longer than that value to know the password is wrong? (at least know probabilistically, which is fine for cracking purposes) And anyway you'd be running this attack in parallel... is this all one big DoS welcome mat?
What I tried before appeared to work, but actually did not; if you care you must review the wiki edit history...
What does work (for me) is, to both lower the value of pam_faildelay.so delay=X in /etc/pam.d/login (I lowered it to 500000, half a second), and also add nodelay (preceded by a space) to the end of the line in common-auth, as described by Gabriel in his answer.
auth [success=1 default=ignore] pam_unix.so nullok_secure nodelay
At least for me (debian sid), only making one of these changes will not shorten the delay appreciably below the default 3 seconds, although it is possible to lengthen the delay by only changing the value in /etc/pam.d/login.
This kind of crap is enough to make a grown man cry!
On Ubuntu 9.10, and I think new versions too, the file you're looking for is located on
/etc/pam.d/login
edit the line:
auth optional pam_faildelay.so delay=3000000
changing the number 3 with another you may want.
Note that to have a 'nodelay' authentication, I THINK you should edit the file
/etc/pam.d/common-auth
too. On the line:
auth [success=1 default=ignore] pam_unix.so nullok_secure
add 'nodelay' to the final (without quotes).
But this final explanation about the 'nodelay' is what I think.
I would like to add a note from a developers perspective. Though this wouldn't be obvious to the naked eye a smart developer would break out of a match query when the match is found. In witness, a successful match would complete faster than a failed match. Because, the matching function would compare the credentials to all known accounts until it finds the correct match. In other words, let's say there are 1,000,000 user accounts in order by IDs; 001, 002, 003 and so on. Your ID is 43,001. So, when you put in a correct username and password, the scan stops at 43,001 and logs you in. If your credentials are incorrect then it scans all 1,000,000 records. The difference in processing time on a dual core server might be in the milliseconds. On Windows Vista with 5 user accounts it would be in the nanoseconds.
I agree. This is an arbitrary programming decision. Putting the delay to one second instead of three doesn't really hurt the crackability of the password, but makes it more user-friendly.
Technically, this deliberate delay is to prevent attacks like the "Linearization attack" (there are other attacks and reasons as well).
To illustrate the attack, consider a program (without this
deliberate delay), which checks an entered serial to see whether it
matches the correct serial, which in this case happens to be
"xyba". For efficiency, the programmer decided to check one
character at a time and to exit as soon as an incorrect character is
found, before beginning the lengths are also checked.
The correct serial length will take longer to process than an incorrect serial length. Even better (for attacker), a serial number
that has the first character correct will take longer than any that
has an incorrect first character. The successive steps in waiting time
is because each time there's one more loop, comparison to go through
on correct input.
So, attacker can select a four-character string and that the string beginning with x takes the most time. (by guess work)
Attacker can then fix character as x and vary the second character, in which case they will find that y takes the longest.
Attacker can then fix the first two characters as xy and vary the third character, in which case they will find that b takes the
longest.
Attacker can then fix the first three character as xyb and vary the fourth character,in which case they will find that a takes the
longest.
Hence, the attackers can recover the serial one character at a time.
Linearization.java.
Linearization.docx, sample output
The serial number is four characters long ans each character has 128
possible values. Then there are 1284 = 228 =
268,435,456 possible serials. If attacker must randomly guess
complete serial numbers, she would guess the serial number in about
227 = 134,217,728 tries, which is an enormous amount of work. On the other hand, by using the linearization attack above, an
average of only 128/2 = 64 guesses are required for each letter, for a
total expected work of about 4 * 64 = 28 = 256 guesses,
which is a trivial amount of work.
Much of the written martial is adapted from this (taken from Mark Stamp's "Information Security: Principles and Practice"). Also the calculations above do not take into account the amount of guesswork needed to to figure out the correct serial length.

Resources