What to store in a typical access log? - access-log

I have thought of the following:
user id if available
user ip address
timestamp
action executed
Am I missing something? Are there any guidelines?

There are different kinds of access logs really. The most common ones are for your page access, and may have the format Sir Darius describes (this is typically called an access log).
Then there's also the logging of internal actions (this is typically called an application log). Many of those will be on a low logging level (meaning that you normally don't see them, but have the ability to switch them on temporarily).
If you don't take precautions, you will get a log like:
Query XYZ executed in 2ms
Query ABC executed in 1ms
Starting transaction
Order send
Starting transaction
Order deleted
Query ABC executed in 1ms
When investigating a production issue, this is often not really useful. Every other line can belong to the same user or to different users. You don't know.
I found it easy to have a format like the following for every such log lines:
Time
IP address
Session ID
User ID
Thread ID/name
Sequence ID
The thread ID or name is important so you can distinguish the situation where the same user is doing multiple requests to your app at the same time.
The sequence ID is a counter that internally counts every request that the user does since the beginning of its session (in Java I used an AtomicInteger for this). The sequence ID is handy since it's an easier method to grep on when examining everything that took place during a specific request, since thread IDs are of course re-used when serving completely different requests. It's also handy when you handle a single request internally by using multiple threads.
With a little effort, a log format like this allows you to extract the actions of a single user from your log and zoom in to individual requests.

There are guidelines that should be used if you intend to use the access logs in order to gather statistics for tools like AWStats or Webalizer.
For example there is the Combined Log Format:
127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 "http://www.example.com/start.html" "Mozilla/4.08 [en] (Win98; I ;Nav)"
defined in Apache as:
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"" combined
This format is commonly used around the web and is understood by most software.
The W3C defines another format called Extended Log File Format, which is specified here: http://www.w3.org/TR/WD-logfile.html
This format is used for example by IIS, and is understood by AWStats.

Related

I need to measure the response time for a file export using Trueclient protocol in loadrunner

I need to measure the response time for a file export using Trueclient protocol in loadrunner.After i click on the export button the file will get downloaded. But i am not able to measure the time for the download accurately.
Pull that data from the HTTP request log, which will show the download request, and, if the w3c time-taken value is included in the log, the time required to fulfill the download.
You can process the log at the end of the test for the response time data. If you need to, you cam import a set of datapoints into analysis for representation with the rest of your data. You might want to consider a normalized value for your download, instead of a raw response time. I imagine that the files are of different sizes, so naturally they will have different download times. However, if you divide download bytes with time (in seconds), then you will have a normalized measurement of bytes per second which then allows you to compare one download to the next for consistent operation.
Also, keep in mind that since you are downloading a file, writing to a local disk, for (presumably) multiple users on a host, you will face the risk of turning your local file system into a bottleneck. You can see this same effect if you turn up logging on all users to the highest level and run your test. The wait for lock and wait for write, plus the actual writing of data, becomes a drag anchor to the performance of your virtual user. This is why the recommended log level is "log on error" or send the error to the output window of the controller via lr_output_message() or lr_vuser_status_message(). Consider a control load generator of the same hardware definition as the others with only a single virtual user of this type on it. If the control group and global group degrade together then you have an app issue. If your control user does not degrade, but your other users do, then you have a test bed induced influence on your results.
These are all issues independent of the tool you are using for the test.

Spamassassin - score by time of day sent

Is there a way to assign a score for mail sent between certain hours. I find a lot of spam is sent in the middle of the night so would like to give anything between say 2am and 5am a score of 2 or 3.
You can use SpamAssassin to penalize mail received within certain hours, but it's messy.
Before we start, verify that SA's primary defenses are properly set up:
DNS Blocklists, including DNSBLs & URIBLs, are a necessity; set them up before all else
Bayes in SpamAssassin is another must-have, though it requires training
Use Razor for fuzzy matching (see also Installing Razor) if the license works for you.
If that's insufficient, then you can address the sort of issue you're keying on. Try:
The RelayCountry plugin to penalize countries you never converse with
The TextCat plugin to discriminate against the languages you never converse in
If all that doesn't help enough, then (and only then) you can consider what you proposed. Read on.
Don't forget about time zones. You can't use the Date header for this reason. This type of rule is not safe for deployments that have conversations that span too many time zones, and you must ensure the MX record servers are all consistent and on the same time zone. Be aware that daylight savings (aka “summer time”) can be annoying here.
Identify a relay that your receiving infrastructure adds but is added before SpamAssassin runs (so SA can see it). This will manifest as a Received header near the top of your email. Again, make sure it's actually visible to SpamAssassin; the Received header added by your IMAP server will not be visible.
It is possible that you have SpamAssassin configured to run before any internal relay is stamped into the message. If this is the case, do not proceed further as you cannot reliably determine the local time.
Okay, all caveats aside, here's an example Received header:
Received: from external-host.example.com
(external-host.example.com [198.51.100.25])
by mx.mydomain with ESMTPS id ABC123DEF456;
Fri, 13 Mar 2020 12:34:56 -0400 (EDT)
This must be a header one of your systems adds or else it could have a different time zone, clock skew, or even a forged timestamp.
Match that in a rule that clearly denotes you as the author (by convention, start it with your initials):
header CL_RCVD_WEE_HOURS Received =~ /\sby\smx\.mydomain\swith\s[^:]{9,64}+:(?<=[0 ][2-4]:)[0-9:]{5}\s-0[45]00\s/
describe CL_RCVD_WEE_HOURS Received by our mx.mydomain relay between 2a and 5a EST/EDT
score CL_RCVD_WEE_HOURS 0.500
A walk through that regex (see also an interactive explanation at Regex101):
First, you need to verify that it's your relay, matched by name: by mx.mydomain with
Then, skip ahead 9-64 non-colon characters (quickly, with no backtracking, thus the + sign). You'll need to verify your server doesn't have any colons here
The real meat is in a look-behind (since we actually skipped over the hour for speed purposes), which seeks the leading zero (or else a space) and then the 2, 3, or 4 (not 5 since we don't want to match a time like 05:59:59)
Finally, there's a sanity check to ensure we're looking at the right time zone. I assumed you're in the US on the east coast, which is -0400 or -0500 depending on whether daylight savings is in effect
So you'll need to change the server name, review whether the colon trick works with your relay, and possibly adjust the time zone regex.
I also gave this a lower score than you desired. Start low and slowly raise it as needed. 3.000 is a really high value.

What is the "t=" query parameter in a socket.io handshake

A socketIO handshake looks something like this :
http://localhost:3000/socket.io/?EIO=3&transport=polling&t=M5eHk0h
What is the t parameter? Can't find a explanation.
This is the timestampParam from engine.io-client. Its value is a Unique ID generated using the npm package yeast.
This is referenced in the API docs under the Socket Constructor Options (docs below). If no value is given to timestampParam when creating a new instance of a Socket, the parameter name is switched to t and assigned a value from yeast(). You can see this in the source for on Line 223 of lib/transports/polling.js
Socket constructor() Options
timestampParam (String): timestamp parameter (t)
To clarify where engine.io-client comes into play, it is a dependency of socket.io-client which, socket.io depends on. engine.io provides the actual communication layer implementation which socket.io is built upon. engine.io-client is the client portion of engine.io.
Why does socket.io use t?
As jfriend00 pointed out in the comments, t is used for cache busting. Cache busting, is a technique that prevents the browser from serving a cached resource instead of requesting the resource.
Socket.io implements cache busting with a timestamp parameter in the query string. If you assign timestampParam a value of ts then the key for the timestamp would be ts, it defaults to t if no value is assigned. By assigning this parameter a unique value created with yeast on every poll to the server, Socket.io is able to always retrieve the latest data from the server and circumvent the cache. Since polling transports would not work as expected without cache busting, timestamping is enabled by default and must be explicitly disabled.
AFAIK, the Socket.io server does not utilize the timestamp parameter for anything other than cache busting.
More about yeast()
yeast() guarantees a compressed unique ID specifically for cache busting. The README gives us some more detailed information on how yeast() works.
Yeast is a unique id generator. It has been primarily designed to generate a unique id which can be used for cache busting. A common practice for this is to use a timestamp, but there are couple of downsides when using timestamps.
The timestamp is already 13 chars long. This might not matter for 1 request but if you make hundreds of them this quickly adds up in bandwidth and processing time.
It's not unique enough. If you generate two stamps right after each other, they would be identical because the timing accuracy is limited to milliseconds.
Yeast solves both of these issues by:
Compressing the generated timestamp using a custom encode() function that returns a string representation of the number.
Seeding the id in case of collision (when the id is identical to the previous one).
To keep the strings unique it will use the . char to separate the generated stamp from the seed.

Synchronize (Replicate) IMAP Messages

I have an imap account, (e.g. some#gmail.com) and I know many libraries with which I can connect and replicate messages back to my destination. I want to achieve following,
First time, I want to download all messages (including sent folders), and when I download for the first time, I will save message with ID and UID locally in some database.
Second time, I do not want to query downloaded messages, even though their read/unread status or any flag or deleted flag is changed or they are purged.
Our aim is to download and sync every messages locally, once and only first time.
Now I know little about IMAP message that they have something called ID, UID and MessageID. ID is probably an offset in current folder, UID is numeric id in current account and MessageID is a unique string.
Now I want to know, what search I should use while querying folder, so that messages once downloaded, wont be returned back to me.
I am planning to use http://mailsystem.codeplex.com/ library, and it gives ability to Search with custom string and returns int array.
Assuming I have, a MaxID, and I want to only download messages which has ID or UID greater than MaxID. Which one should I use? UID or ID?
You should use the UID in combination with UIDVALIDITY. Both values are folder specific.
There is an informational RFC that describes how IMAP clients should do synchronization (RFC-4549, section 4.3). The text recommends issuing the following two commands:
tag1 UID FETCH <lastseenuid+1>:* <descriptors>
tag2 UID FETCH 1:<lastseenuid> FLAGS
The first command is used to fetch the required information for all unknown mails (without knowing how many mails there are). The second command is used to synchronize the flags for the already seen mails.
AFAIK this method is widely used. Therefore, many IMAP servers contain optimizations in order to provide this information quickly. Typically, the network bandwidth is the limiting factor.

Full statement from ISO 8583

I would like to know if it is possible to do a full statement (between a date range) through ISO 8583, I have seen ATMs which do full statements and was wondering what method they used. I know balance inquiry and mini statements are possible on a POS devise over 8583.
If it is possible does anyone have an information on the structure of the message, ideally for FLexcube.
we did something similar to that back in 1999 in one of the banks, where we would send the statement data in one of the generic private use fields, where it would allow the format ANS 999
but that means you are either to restrict the data to less than 999 characters, or to split the data on multiple messages. and have a multi legged transaction.
you would have the following flow
Customer request for statement on ATM
ATM sends NDC/D912 message to ATM Switch
ATM Switch look up account number after authenticating the card and forward the request to Core Banking Application
Core banking application would generate the statement and format it according to predesigned template and send the statement data into a generic field (say 72)
ATM Switch collects the data and formats it to NDC or D912 format where the statement data is tagged to statement printer (in NDC it is a field called q and the value should be ‘8’ - Print on statement printer only)
and on the field r place the preformatted data
however, it is not a good practice to do so, since we have faster means to generate a statement and send to email or internet banking. but this is the bank's preference anyways.
It depends upon implementation,
I had implemented NCR central switch, where I incorporate initial checking stuffs in the Central application itself rather than passing everything to Auth Host.
My implementation.
ATM Sends (NCD) the transaction requests based on State Machine setup in ATM to Central Application.
Central does basic checkings such as Validity of BIN (initial 6 digit of card no.) and also checks if the requested amount of cash is available in the ATM etc.
The the Central App sends the packet (ISO8583/BASE24) is sent to the Acquirer for further processing.
Acquires Sends it to CA and then it goes to Issuer for Approval.
Hope this helps.
The mini-statement is not part of ISO 8583 (or MVA). It is usually implemented as a proprietary extension. Hence you need to go to an ATM owned by your bank, or, is part of a consortium of banks that share an ATM infrastructure with your bank.
We implemented mini-statements in our ISO-8583 specification utilizing a $0.00 0200 (DE003 = 91xxxx) message and the statement data coming back from the host on DE125 on both Connex and Base24 and then modified our stateful loads to print the data at the ATM.
Though full statements fell out of use years ago so we removed it to just be mini-statements now utilizing the receipt printer vs. full page statements. There is a limited number of entries and not all host support it but it is used today on NCR & Diebold ATMs. I've personally participated in the testing in getting it to work on Base24 and Postilion.
The mini-statement data we do print is 40 characters per line and prints about 10 transactions I believe.

Resources