Read the log files and get the entries between two dates - linux

I want to extract some information from the access log file that matches a keyword and between two dates. For ex. I want to find log entries between two dates that contains text "passwd". For now, I am using the following command but not getting the correct results:
fgrep "passwd" * | awk '$4 >= "[20/Aug/2017" && $4 <= "[22/Aug/2017"'
Date format is [22/Feb/2017:17:28:42 +0000].
I have searched and look at this post too extract data from log file in specified range of time but not exactly understand how to use it.
Edits:
Following are the example entries of the access log files,
xxx-access_log:xx.xx.xx.xx - - [22/Feb/2017:17:30:02 +0000] "GET /cms/usr/extensions/get_tree.inc.php?GLOBALS[root_path]=/etc/passwd%00 HTTP/1.1" 404 39798
xxx-access_log:xx.xx.xx.xx - - [22/Feb/2017:17:31:12 +0000] "GET /cgi-bin/libs/smarty_ajax/index.php?_=&f=update_intro&page=../../../../../../../../../../../../../../../../../../etc/passwd%00 HTTP/1.1" 404 30083
xxx-access_log:xx.xx.xx.xx - - [22/Feb/2017:17:31:19 +0000] "GET /download/libs/smarty_ajax/index.php?_=&f=update_intro&page=../../../../../../../../../../../../../../../../../../etc/passwd%00 HTTP/1.1" 404 27982
xxx-access_log:xx.xx.xx.xx - - [22/Feb/2017:17:31:24 +0000] "GET /sites/libs/smarty_ajax/index.php?_=&f=update_intro&page=../../../../../../../../../../../../../../../../../../etc/passwd%00 HTTP/1.1" 404 35256
xxx-access_log:xx.xx.xx.xx - - [22/Feb/2017:17:28:32 +0000] "GET /modx/manager/media/browser/mcpuk/connectors/php/Commands/Thumbnail.php?base_path=/etc/passwd%00 HTTP/1.1" 404 6956
xxx-access_log:xx.xx.xx.xx - - [22/Feb/2017:17:28:42 +0000] "GET /modx/manager/media/browser/mcpuk/connectors/php/Commands/Thumbnail.php?base_path=/etc/passwd%00 HTTP/1.1" 404 6956
Thanks for help in advance!

The link you quoted would be used if you know 2 specific strings that appear in your log file. That command will search for the first string and display all lines until it finds the second string and then stops.
In your case, if you want generic date manipulation, you might be better off with perl and one of the date/time modules. Most (if not all) of those have built-in date comparison routines, and many of them will take the date in almost any format imaginable ... and the ones that don't typically provide the ability to specify the date format.
(If you're just using dates and not using times, then Date::EzDate is my favorite, and probably the easiest to learn and implement quickly.)
Shell commands are probably not going to do a good job of date manipulation.

Related

splunk extraction from the log entries

Need to extract payload data from logs entries and extract the PlatformVersion and PlatformClient values. Need in python code.
"tracking~2015~526F3D98","2015:1302",164,1,"2022-02-07 11:10:08.744 INFO [threadPoolTaskExecutorTransformed5 - ?] saving event to log =core-server-event-tracking-api, payload={""PlatformVersion"":""6.34.36 - 4.18.6"",""PlatformClient"":""html""},53
"tracking~2015~526F3D98","2015:130",164423,1,"2022-02-07 11:10:08.744 INFO [threadPoolTaskExecutorTransformed5 - ?] saving event to log =core-server-event-tracking-api, payload={""PlatformVersion"":""6.34.37 - 4.18.7"",""PlatformClient"":""xml""},54
Not sure how Python and Splunk are relating here - but this is just a matter of doing some field extractions.
Something like this should do it:
index=ndx sourcetype=srctp
| field field=_raw "PlatformVersion\W+(?<platform_version>[^\"]+)"
| rex field=_raw "PlatformClient\W+(?<platform_client>[^\"]+)"

How to write the grok expression for my log?

I am trying to write a grok to analysis my logs.
Use logstash 7 to collect logs. But I failed writing grok after many attempts.
Log looks like this:
[2018-09-17 18:53:43] - biz_util.py [Line:55] - [ERROR]-[thread:14836]-[process:9504] - an integer is required
My Grok(fake):
%{TIMESTAMP_ISO8601 :log_time} - %{USERNAME:module}[Line:%{NUMBER:line_no}] - [%{WORD:level}]-[thread:%{NUMBER:thread_no}]-[process:%{NUMBER:process_no}] - %{GREEDYDATA:log}
Only the timestamp part is OK. The others failed.
that will work:
\[%{TIMESTAMP_ISO8601:log_time}\] - %{DATA:module} \[Line:%{NUMBER:line_no}\] - \[%{WORD:level}\]-\[thread:%{NUMBER:thread_no}\]-\[process:%{NUMBER:process_no}\] - %{GREEDYDATA:log}
you need to escap [
This will work,
[%{TIMESTAMP_ISO8601:log_time}] %{NOTSPACE} %{USERNAME:module} [Line:%{BASE10NUM:Line}] %{NOTSPACE} [%{LOGLEVEL}]%{NOTSPACE}[thread:%{BASE10NUM:thread}]%{NOTSPACE}[process:%{BASE10NUM:process}]

Grok Filter String Pattern

I am pretty new to Grok and I need to filter a line as the one below:
Dec 20 18:46:00 server-04 script_program.sh[14086]: 2017-12-20 18:46:00 068611 +0100 - server-04.location-2 - 14086/0x00007f093b7fe700 - processname/SIMServer - 00000000173d9b6b - info - work: You have 2 connections running
So far I just managed to get the following:
SYSLOGBASE %{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} %{SYSLOGPROG}:
So I get all the date/timestamp details + program + process which is ok.
But that leaves me with the following remaining string:
2017-12-20 18:46:00 068611 +0100 - server-04.location-2 - 14086/0x00007f093b7fe700 - processname/SIMServer - 00000000173d9b6b - info - work: You have 2 connections running
And here I am struggling to break everything into chunks.
I have tried lot of combinations trying to split that based on the hyphen (-) but I am failing so far to do so..
So far I have been pretty much using as a guideline the following:
https://github.com/logstash-plugins/logstash-patterns-core/blob/master/patterns/grok-patterns
Any help/suggestions/tips on this please?
I am using graylog2 and as shown above, trying to use GROK for filtering my messages out..
Many thanks
I managed to get my filter fully done and working. so the solution is below:
SERVER_TIMESTAMP_ISO8601 %{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:?%
{MINUTE}(?::?%{SECOND})?[T ]%{INT}[T ]%{ISO8601_TIMEZONE}?
SERVER_HOSTNAME \b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b)
SERVER_Unknown %{SERVER_HOSTNAME}[/]%{SERVER_HOSTNAME}
SERVER_Loglevel ([Aa]lert|ALERT|[Tt]race|TRACE|[Dd]ebug|DEBUG|[Nn]otice|NOTICE|[Ii]nfo|INFO|[Ww]arn?(?:ing)?|WARN?(?:ING)?|[Ee]rr?(?:or)?|ERR?(?:OR)?|[Cc]rit?(?:ical)?|CRIT?(?:ICAL)?|[Ff]atal|FATAL|[Ss]evere|SEVERE|EMERG(?:ENCY)?|[Ee]merg(?:ency)?)
SYSLOGBASE_SERVER %{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} %{SYSLOGPROG}:[T ]%{SERVER_TIMESTAMP_ISO8601: timestamp_match}[T ]-[T ]%{SERVER_HOSTNAME:SERVER_host_node}[T ]-[T ]%{SERVER_Unknown:SERVER_Unknown}[T ]-[T ]%{SERVER_Unknown:service_component}[T ]-[T ]%{SERVER_HOSTNAME:process_code_id}[T ]-[T ]%{SERVER_Loglevel}[T ]-[T ]%{GREEDYDATA:syslog_message}
All the rest or regular expresions from GROK.
Many thanks

CUPS uses the wrong DISPLAY parameter

We are using the Xerox Linux drivers to print on our multifunction printer. Basically, when you print the driver opens a pop-up window to let you choose different printing options, then calls lp for printing.
This works pretty well on single user computer, but when many users are logged-in at the same time on the machine, the driver doesn't know which DISPLAY to use (:0, :1, :2, etc.). Thus when printing, the pop-up appears on :0 even though a user can be on :1 or :2.
When it comes to printing, the printing subsystem runs as an OS user (lp on Debian). This OS user doesn't have an X session and thus no DISPLAY value. Since DISPLAY isn't set, the driver assumes :0 being the typical single-user client display. Therefore when using the User Switching mechanism, CUPS does not forward the requesting user's DISPLAY so the driver assumes the :0 display if not specified. This causes user2's driver interface to be sent to user1's display.
Here is a snippet of the log when printing. You can see I called the process with tech, but lp is the one printing:
localhost - tech [06/May/2016:15:06:42 -0400] "POST / HTTP/1.1" 200
362 Create-Printer-Subscriptions successful-ok
localhost - lp [06/May/2016:15:06:55 -0400] "POST /printers/xeroxtq1
HTTP/1.1" 200 346 Create-Job successful-ok
localhost - lp [06/May/2016:15:06:55 -0400] "POST /printer /xeroxtq1
HTTP/1.1" 200 33861 Send-Document successful-ok
I am not looking for a full walkthrough solution here (if you have one I won't spit on it though), but for some hints on what I should try to do. I've thought of:
1 - Disabling user switching in GNOME3, but this is a last resort solution since it is quite useful for users
2 - Forcing CUPS to call lp with the -o DISPLAY option, grepped from the user that called the process. If this were feasible, it would be quite nice.
3 - Force GNOME3 to show currently used user on :0 and move idle ones to other displays.
I have no idea how #2 could be done and I'm not sure if #3 is feasible.
I've already tweaked GNOME3 to log off users that are idle more than 30min, but it's not enough to solve the problem.
Any help?

Counting requests and status codes per URI in a webserver log

Given a typical webserver log file that contains a mixture of absolute URLs, relative URLs, human requests and bots (some sample lines):
112.77.167.177 - - [01/Apr/2016:22:40:09 +1100] "GET /bad-credit-loans/abc/ HTTP/1.1" 200 7532 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
189.181.124.177 - - [31/Mar/2016:23:10:47 +1100] "GET /build/assets/css/styles-1a879e1b.css HTTP/1.1" 200 31654 "https://www.abc.com.au/customer-reviews/" "Mozilla/5.0 (iPhone; CPU iPhone OS 9_2_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13D15 Safari/601.1"
110.76.15.146 - - [01/Apr/2016:00:25:09 +1100] "GET http://www.abc.com.au/car-loans/low-doc-car-loans/ HTTP/1.1" 301 528 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
I'm looking to list all the URI's requested with status code (200, 302 etc.) and total count of requests i.e.
http://www.abc.com.au 301 3,900
/bad-credit-loans/abc/ 200 123
/bad-credit-loans/abc/ 302 7
Were it not for the presence of the varying IP addresses, timestamps, referring URLs, and user agents, I would be able to combine uniq and sort in the standard fashion. Or if I knew all the URLs in advance, I could simply loop over each URL-status code combo with grep in its simplest form.
How do we disregard the varying items (user agents, timestamps etc.) and extract just the URLs and their frequency of status code?
You should just recognize taht the interesting parts are always on constant filed positions (with respect to space separated fields).
URL is at position 7 and status code is at position 9.
The rest is trivial. You may e.g. use:
awk '{sum[$7 " " $9]++;tot++;} END { for (i in sum) { printf "%s %d\n", i, sum[i];} printf "TOTAL %d\n", tot;}' LOGFILES
And then sort using sort the result if you need the outpout sorted.

Resources