Why does PSI API put values of originLoadingExperience in loadingExperience, in case CRuX does not have sufficient real-world speed data for the page? - pagespeed-insights

In case the Chrome User Experience Report does not have sufficient real-world speed data for the page, the response of the PSI API (v5) does contain values for all properties of loadingExperience in the response. These values are exactly the same as the values for originLoadingExperience.
The problem with this is that from the API response you can't know if the data in loadingExperience is valid or the duplicate of originLoadingExperience.
Sure, it's possible to compare all those values and if all match exactly it is kinda safe to conclude it's a case of duplication, but this is not bullet-proof and requires extra code.
Is there a way to reliably know from the API response if CRuX did not have sufficient data for the page?

Yes I've seen this too and I consider it a bug in the PSI API. I've already flagged it with the PSI team suggesting that they include an error message in the loadingExperience field to clarify that it doesn't have sufficient data for the page.

Related

REST API what is the proper way of URI naming using nouns and not verbs?

I have a login and I need to do something like this:
1. POST .../authentication/login
2. POST .../authentication/verifyToken
3. POST .../authentication/forgotPassword
Will ask for a phone and a password.
Will ask for an auth token.
Will ask for a phone and a password.
But as I read, this structure is not good since it contains verbs rather than nouns.
I have tried to make something like this:
1. POST .../sessions/new
2. GET .../sessions/:token
3. GET .../sessions/forgot
1. Will create a new token, based on phone and password correct credentials.
2. Will verify the token validation or expiration.
3. Will send a SMS within a new password or a new temporary password reset code.
This first method is not REST. But its totally clear.
You can read the URL and understand exactly what it gonna to do. You do not need any kind of explanation.
However, more and more articles say that verbs in REST is not RESTFUL and thus, not a good practice to achieve.
What is the proper way of handle this?
REST doesn't care what spelling you use for your resource identifiers.
But as I read, this structure is not good since it contains verbs rather than nouns.
The structure is fine.
You can read the URL and understand exactly what it gonna to do.
Right -- REST does not care if you can read the URL and understand what it is going to do.
URL/URI are identifiers, in the same way that variable names in your program are identifiers. Your compiler doesn't particularly care whether your variable is named accountBalance, purpleMonkeyDishwasher, or x. The spellings described in our coding standards are for human readers, and the same is true for hackable urls.
However, more and more articles say that verbs in REST is not RESTFUL and thus, not a good practice to achieve.
REST is an architectural style
As an architectural style for network-based applications, its definition is presented in the dissertation incrementally, as an accumulation of design constraints that derive from nine pre-existing architectural styles and five additional constraints unique to the Web
Fielding describes the constraints in his thesis; any who claim that verbs in identifiers "are not RESTful" should be able to point to a specific constraint that they violate.
I have yet to see a compelling argument that any of the REST constraints are violated by identifier spellings.
First I would not use the term session. This implies a server side state, what is problematic in stateless communication as REST requires.
So your problem could be solved by modeling resources, probably like:
GET ./authentication/token
to get a token if valid credentials provided in the request headers.
GET ./authentication/password
to get a new temporary password if E-mail address provided in the request headers.
You also can use POST in order to transport values in the request body.
Be aware of that the service should answer with an HTTP 204 in case of the result is being sent by SMS.

Shaping API endpoints and response

Hello Stackoverflow,
I'm writing API's for quite a bit of time right now and now it came to work with one of these bigger api's. Started wondering how to shape this API, as many times I've seen on a bigger platforms that one big entity (for example product page in shop) is loaded separately (We can see that item body loaded, but comments are still fetching etc.).
Usually what I've done was attaching comments as a relation in SQL query, so my frontend queried single API Endpoint like:
http://api.example.com/items/:id
And it returned all necessary data like seller info, photos etc.
Logically seller info and photos are small pieces of data (Item can only have 1 seller and no more than 10 photos for example), but number of comments might be way larger collection with relationship (comment author).
Does it make sense to separate one endpoint into 2 independent endpoints like:
http://api.example.com/items/:id
http://api.example.com/items/:id/comments
What are downsides of this approach? Is it common practice? Or maybe I misunderstood some concept?
One downside might be 2 request performed, but on the other hand, first endpoint should return data faster (as it's lighter than fetching n of comments), so page might be displayed faster and display spinner for comments section. This way I'll be able to paginate comments too.
Are there any improvements that might be included in this separation of endpoints? Or maybe I'm totally wrong and it should be done totally different way?
I think it is a good approach if:
The number of comments of one item can be large, because with this approach you could paginate it easier.
If you are going to need to access to the comments of one item without needing rest of item information
I think any of the previous conditions justify this decition, and yes, it is common approach.

standard way of encoding pagination info on a restful url get?

I think my question would be better explained with a couple of examples...
GET http://myservice/myresource/?name=xxx&country=xxxx&_page=3&_page_len=10&_order=name asc
that is, on the one hand I have conditions ( name=xxx&country=xxxx ) and on the other hand I have parameters affecting the query ( _page=3&_page_len=10&_order=name asc )
now, I though about using some special prefix ( "_" in thes case ) to avoid collisions between conditions and parameters ( what if my resourse has an "order" property? )
is there some standard way to handle these situations?
--
I found this example (just to pick one)
http://www.peej.co.uk/articles/restfully-delicious.html
GET http://del.icio.us/api/peej/bookmarks/?tag=mytag&dt=2009-05-30&start=1&end=2
but in this case condition fields are already defined (there is no start nor end property)
I'm looking for some general solution...
--
edit, a more detailed example to clarify
Each item is completely indepent from one another... let's say that my resources are customers, and that (luckily) I have a couple millions of them in my db.
so the url could be something like
http://myservice/customers/?country=argentina,last_operation=2009-01-01..2010-01-01
It should give me all the customers from argentina that bought anything in the last year
Now I'd like to use this service to build a browse page, or to fill a combo with ajax, for example, so the idea was to add some metada to control what info should I get
to build the browse page I would add
http://...,_page=1,_page_len=10,_order=state,name
and to fill an autosuggest combo with ajax
http://...,_page=1,_page_len=100,_order=state,name,name=what_ever_type_the_user*
to fill the combo with the first 100 customers matching what the user typed...
my question was if there was some standard (written or not) way of encoding this kind of stuff in a restfull url manner...
While there is no standard, Web API Design (by Apigee) is a great book of advice when creating Web APIs. I treat it as a sort of standard, and follow its recommendations whenever I can.
Under "Pagination and partial response" they suggest (page 17):
Use limit and offset
We recommend limit and offset. It is more common, well understood in leading databases, and easy for developers.
/dogs?limit=25&offset=50
There's no standard or convention which defines a way to do this, but using underscores (one or two) to denote meta-info isn't a bad idea. This is what's used to specify member variables by convention in some languages.
Note:
I started writing this as a comment to my previous answer. Then I was going to add it as an edit, but I think that it belongs as a separate answer instead. This is a completely different approach and a separate answer in its own right since it is a different approach.
The more that I have been thinking about this, I think that you really have two different resources that you have to deal with:
A page of resources
Each resource that is collected into the page
I may have missed something (could be... I've been guilty of misinterpretation). Since a page is a resource in its own right, the paging meta-information is really an attribute of the resource so placing it in the URL isn't necessarily the wrong approach. If you consider what can be cached downstream for a page and/or referred to as a resource in the future, the resource is defined by the paging attributes and the query parameters so they should both be in the URL. To continue with my entirely too lengthy response, the page resource would be something like:
http://.../myresource/page-10/3?name=xxx&country=yyy&order=name&orderby=asc
I think that this gets to the core of your original question. If the page itself is a resource, then the URI should describe the page so something like page-10 is my way of saying "a page of 10 items" and the next portion of the page is the page number. The query portion contains the filter.
The other resource names each item that the page contains. How the items are identified should be controlled by what the resources are. I think that a key question is whether the result resources stand on their own or not. How you represent the item resources differs based on this concept.
If the item representations are only appropriate when in the context of the page, then it might be appropriate to include the representation inline. If you do this, then identify them individually and make sure that you can retrieve them using either URI fragment syntax or an additional path element. It seems that the following URLs should result in the fifth item on the third page of ten items:
http://.../myresource/page-10/3?...#5
http://.../myresource/page-10/3/5?...
The largest factor in deciding between these two is how strongly coupled the individual item is with the page. The fragment syntax is considerably more binding than the path element IMHO.
Now, if the item resources are free-standing and the page is simply the result of a query (which I think is likely the case here), then the page resource should be an ordered list of URLs for each item resource. The item resource should be independent of the page resource in this case. You might want to use a URI that is based on the identifying attribute of the item itself. So you might end up with something like:
http://.../myresource/item/42
http://.../myresource/item/307E8599-AD9B-4B32-8612-F8EAF754DFDB
The key deciding factor is whether the items are freestanding resources or not. If they are not, then they are derived from the page URI. If they are freestanding, then they should have their are defined by their own resources and should be included in the page resource as links instead.
I know that the RESTful folk tend to dislike the usage of HTTP headers, but has anyone actually looked into using the HTTP ranges to solve pagination. I wrote a ISAPI extension a few years back that included pagination information along with other non-property information in the URI and I never really like the feel of it. I was thinking about doing something like:
GET http://...?name=xxx&country=xxxx&_orderby=name&_order=asc HTTP/1.1
Range: pageditems=20-29
...
This puts the result set parameters (e.g., _orderby and _order) in the URI and the selection as a Range header. I have a feeling that most HTTP implementations would screw this up though especially since support for non-byte ranges is a MAY in RFC2616. I started thinking more seriously about this after doing a bunch of work with RTSP. The Range header in RTSP is a nice example of extending ranges to handle time as well as bytes.
I guess another way of handling this is to make a separate request for each item on the page as an individual resource in its own right. If your representation allows for this, then you might want to consider it. It is more likely that intermediate caching would work very well with this approach. So your resources would be defined as:
myresource/name=xxx;country=xxx/orderby=name;order=asc/20/
myresource/name=xxx;country=xxx/orderby=name;order=asc/21/
myresource/name=xxx;country=xxx/orderby=name;order=asc/22/
myresource/name=xxx;country=xxx/orderby=name;order=asc/23/
myresource/name=xxx;country=xxx/orderby=name;order=asc/24/
I'm not sure if anyone has tried something like this or not. This would make URIs constructible which is always a useful property IMHO. The bonus to this approach is that the individual responses could be cached and the server is free to optimize handling of collecting pages of items and what not in the most efficient way. The basic idea is to have the client specify the query in the URI and the index of them item that it wants to retrieve. No need to push the idea of a "page" into the resource or even to make it visible. The client can iteratively retrieve objects until it's page is full or it receives a 404.
There is a downside of course... the HTTP server and infrastructure has to support pipelining or the cost of creation/destruction of connections might kill the idea outright.

How can I prevent bulk vulnerability scanning without using a CAPTCHA component?

How can I prevent that forms can be scanned with a sort of massive vulnerability scanners like XSSME, SQLinjectMe (those two are free Firefox add-ons), Accunetix Web Scanner and others?
These "web vulnerability scanners" work catching a copy of a form with all its fields and sending thousands of tests in minutes, introducing all kind of malicious strings in the fields.
Even if you sanitize very well your input, there is a speed response delay in the server, and sometimes if the form sends e-mail, you vill receive thousands of emails in the receiver mailbox. I know that one way to reduce this problem is the use of a CAPTCHA component, but sometimes this kind of component is too much for some types of forms and delays the user response (as an example a login/password form).
Any suggestion?
Thanks in advance and sorry for my English!
Hmm, if this is a major problem you could add a server-side submission-rate limiter. When someone submits a form, store some information in a database about their IP address and what time they submitted the form. Then whenever someone submits the form, check the database to see if it's been "long enough" since the last time that IP address submitted the form. Even a fairly short wait like 10 seconds would seriously slow down this sort of automated probing. This database could be automatically cleared out every day/hour/whatever, you don't need to keep the data around for long.
Of course someone with access to a botnet could avoid this limiter, but if your site is under attack by a large botnet you probably have larger problems than this.
On top the rate-limiting solutions that others have offered, you may also want to implement some logging or auditing on sensitive pages and forms to make sure that your rate limiting actually works. It could be something simple like just logging request counts per IP. Then you can send yourself an hourly or daily digest to keep an eye on things without having to repeatedly check your site.
Theres only so much you can do... "Where theres a will theres a way", anything that you want the user to do can be automated and abused. You need to find a median when developing, and toss in a few things that may make it harder for abuse.
One thing you can do is sign the form with a hash, for example if the form is there for sending a message to another user you can do this:
hash = md5(userid + action + salt)
then when you actually process the response you would do
if (hash == md5(userid + action + salt))
This prevents the abuser from injecting 1000's of user id's and easily spamming your system. Its just another loop for the attacker to jump through.
Id love to hear other peoples techniques. CAPTCHA's should be used on entry points like registration. And the method above should be used on actions to specific things (messaging, voting, ...).
also you could create a flagging system, and anything the user does X times in X amount of time that may look fishy would flag the user, and make them do a CAPTCHA (once they enter it they are no longer flagged).
This question is not exactly like the other questions about captchas but I think reading them if you haven't already would be worthwhile. "Honey Pot Captcha" sounds like it might work for you.
Practical non-image based CAPTCHA approaches?
What can be done to prevent spam in forum-like apps?
Reviewing all the answers I had made one solution customized for my case with a little bit of each one:
I checked again the behavior of the known vulnerability scanners. They load the page one time and with the information gathered they start to submit it changing the content of the fields with malicious scripts in order to verify certain types of vulnerabilities.
But: What if we sign the form? How? Creating a hidden field with a random content stored in the Session object. If the value is submitted more than n times we just create it again. We only have to check if it matches, and if it don't just take the actions we want.
But we can do it even better: Why instead to change the value of the field, we change the name of the field randomly? Yes changing the name of the field randomly and storing it in the session object is maybe a more tricky solution, because the form is always different, and the vulnerability scanners just load it once. If we don’t get input for a field with the stored name, simply we don't process the form.
I think this can save a lot of CPU cycles. I was doing some test with the vulnerability scanners mentioned in the question and it works perfectly!
Well, thanks a lot to all of you, as a said before this solution was made with a little bit of each answer.

When is it OK to intentionally obfuscate URLs?

Having friendly URLs is generally a good thing. However, there are sometimes when it seems like a bad idea. What is your rule of thumb?
For instance, consider a situation where I want to show a Registration Success page. I want all of the underlying logic to be the same. However, depending on how they registered, I may want to display a different message for someone who registered under a certain type of role.
Here are a few, off-the-cuff examples of "hackable" (as described in link) URLs:
http://www.example.com/RegistrationSuccess.aspx?IsCertainRole=true
http://www.example.com/RegistrationSuccess.aspx?role=CertainRole
http://www.example.com/RegistrationSuccess.aspx?r=2876
All of these seem bad since I don't want the URLs to be discoverable. On the other hand, I hate to do something more complex just to modify the success message slightly.
How would you handle this?
Bear in mind that obfuscating URLs is NOT a security measure. You should never trust outside input - filter, sanitize and implement restrictive logic. No matter how clever you believe your obfuscation scheme to be, people have cracked much more complicated security schemes with relative ease.
As a general rule of thumb - there is no good reason to obfuscate URLs intentionally. Use URLs to communicate read operations (a path to a resource). Use POST requests to communicate write operations (adding/modifying data). If a user isn't supposed to be able to do something through the URL, it should be regulated server side and through the request method.
You can either POST the data, or, if that's not an option, set the value in a Session variable and then read the value in the success page. The actual complexity of code using the Session is about the same as using the query string.
Ok, if you don't think this is a security issue since you are only displaying a different message, then why do you care if its hackable or not?
Most users won't wouldn't notice the url is editable, so why obfuscate? The "elite hackers" will get a slightly different message, big deal.
The general answer to, "Should I obfuscate...?" is no. If its for security, hell no, otherwise why are you obfuscating? Most likely, you are wasting time.
URL's are for uniquely referencing content. When the contents are the results of a process that involves several steps of dialogue, these contents can't really have a URL, because the URL does not reproduce the process.
I would forward them to RegistrationSuccess.aspx and present contents there based on the state of the session.
If somebody comes to that URL without the suitable session state, I would forward them to the front page after 5 seconds of looking at a friendly message explaining that there is nothing to see.
A better choice yet, may be to forward them to MyRegistration.aspx which is something they would perhaps like to make a favourite out of. Coming from the Registration process, it may have a box explaining that they have successfully registered. If they not coming there from the Registration process, this box is not presented. The rest of the page is the summary of all earlier Registration processes for that user.
With a post submission?
If you don't want information in the URL, don't put it in the URL.
It's not always easy to do...
I would say that pages that you want to be easily indexed by the search engines use URL Routing. This includes high traffic pages.
For other pages where the users will only visit few times a month or year you can leave those to be normal urls.
If you must absolutely use the URL for private/personalized data, you'd probably be better off generating a random unique identifier on the server and using that in your URL. Kind of like confirmation e-mails where you have to click a link.
Otherwise, if there's any other way to not include data in the URL, you shouldn't. In the case of a successful registration, either the person just registered and you should be in a current session, or you should require them to login before they see the customized page.
Why not make "registration success" message be a last step, but not change pages?
You can use Ajax or Server.Transfer() to do that.
I could check from a whitelist of referring URLs so that they can't just type in a different URL. That might eliminate obvious "hack" from a passerby.
(Obviously, you can get around this if you're a nerd.)
You could make some sort of checksum or hash on the querystring items, so if they mess around with the URL, the checksum fails and it kicks them out to the main page.

Resources