How to understand and get information from a log string

How to understand and get information from a log string - browser

I have a user log example like this:
141.154.49.202 - - [21/Jul/2021:14:26:42 +0000]
"GET /projects/operation/report/index.htm?product=home&msid=fg552595-976c-58dg-0689-c3d8ffb0d904&zip=90221&ho_prod=G&prog_status=ghsdsq&track_id=89e05bmb3eg05fef095929d1c39106&cust_id=&quoteStartID=916d1b0d-118f056g2-0705-916bgdd4bf47 HTTP/1.1"
302 5
"https://www.yourwebiste.com/"
"Mozilla/5.0 (iPad; CPU OS 13_5 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.2.3 Mobile/26F259 Safari/715.2" 0.029 0.000 0.029 0.029 -
Based on the following answers, I got part of the information from the log
How to read useragent details
Understanding Apache's access log
Here are what I got:
141.154.49.202: user IP address
[21/Jul/2021:14:26:42 +0000]: date, time and timezone
Mozilla/5.0: product
iPad; CPU OS 13_5 like Mac OS X: device and system-information
AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.2.3: platform and details
Mobile/26F259: device type
Safari/715.2: browser information
Now, my question is what the rest of the log stands for? Specifically,
What are the two URLs (first: "GET URL HTTP/1.1"; second: "www.yourwebiste.com/") mean? I guess one is the page user visit, but what is the another?
What is the meaning of 302 5 between two URLs?
what is the series of numbers at the end(0.029 0.000 0.029 0.029) refers to

Related

MDNS causes local HTTP requests to block for 2 seconds

It takes 0.02 seconds to send a message via python code requests.post(http://172.16.90.18:8080, files=files), but it takes 2 seconds to send a message via python code requests.post(http://sdss-server.local:8080, files=files)
The following chart is the packet I caught with wireshark, from the first column of 62 to 107, you can see that it took 2 seconds for mdns to resolve the domain name.
My system is ubuntu 18.04, I rely on this link Mac OS X slow connections - mdns 4-5 seconds - bonjour slow
I edited the /etc/hosts file, I changed this line to
127.0.0.1 localhost sdss-server.local
After modification, it still takes 2 seconds to send the message via python code requests.post(http://sdss-server.local:8080, files=files).
Normally it should be 0.02 to 0.03 seconds, what should I do to fix this and reduce the time from 2 seconds to 0.02 seconds?

How to get the ISA06 Interchange Sender ID of an FTP transfer in debian linux

I have been following a trail of breadcrumbs for a couple days now. My company is needing a simple API/EDI built that can communicate with a bunch of different marketplaces. One of them requires I give them the ISA Interchange Sender ID to even make FTP requests to their server.
Here is a link to a page enumerating on what exactly is ISA06, in x12 ANSI. Relevant content copied and pasted from the site.
The ISA Segment has the following structure
ISA01 Authorization Information Qualifier : min/max – 2/2
ISA02 Authorization Information : min/max – 10/10
ISA03 Security Information Qualifier : min/max – 2/2
ISA04 Security Information : min/max – 10/10
ISA05 Interchange ID Qualifier : min/max – 2/2
ISA06 Interchange Sender ID : min/max – 15/15
ISA07 Interchange ID Qualifier : min/max – 2/2
ISA08 Interchange Receiver ID : min/max – 15/15
ISA09 Interchange Date : min/max – 6/6
ISA10 Interchange Time : min/max – 4/4
ISA11 Interchange Control Standards ID : min/max – 1/1
ISA12 Interchange Control Version Number : min/max – 5/5
ISA13 Interchange Control Number : min/max – 9/9
ISA14 Acknowledgment Requested : min/max – 1/1
ISA15 Test Indicator : min/max – 1/1
ISA16 Subelement Separator : min/max – 1/1
Link to full page: http://edicrossroad.blogspot.com/2008/12/isa-and-gs-segment-elements-enumeration.html
I can't find any information on how to inspect the entire request in plain txt format. It needs to be FTP but even a curl function would be great right about now and put me on track. The regular curl_getinfo function does not go into enough detail to even mention ISA at all.
I do see a bunch of different proprietary parsers that you can buy a license for, but it's overkill for our needs (which is just to transfer a couple .csv files with FTP to update information with the marketplace once a day)
Any help would be greatly appreciated.

Ive had some luck with EDI.Net (open sourced) and EdiFabric (closed source). Both are excellent libraries for generating and receiving feeds like the above. For manual work, X12 Studio is good for beginners but I personally like to use Sublime.
Here is a collection of tools if you are looking for something else: https://github.com/michaelachrisco/Electronic-Interchange-Github-Resources

Can I use TCP dump to just get the host/domain/ip and port of a packet so it can be easily parsed by PHP?

I am trying to collect hostname/ip and port from tcp dump.
I get kinda close using :
-s 0 -A -q 'tcp[((tcp[12:1] & 0xf0) >> 2):4] = 0x47455420'
but it contains way too much garbage and I dont see a logical way to parse it:
18:04:26.935060 IP 51.234.18.40.60495 > 74.125.226.201.80: tcp 664
E...>)#.#...3..(J}...O.Pqc.y.rs......h.....
.......UGET /embed/QobxnFYhMos HTTP/1.1
Host: www.youtube.com
Connection: keep-alive
Referer: http://www.businessinsider.com/fake-house-pumping-stations-2014-1
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
x-wap-profile: http://device.sprintpcs.com/Samsung/SPH-L710/MK3.rdf
User-Agent: Mozilla/5.0 (Linux; U; Android 4.3; en-us; SPH-L710 Build/JSS15J) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30
Accept-Encoding: gzip,deflate
Accept-Language: en-US
Accept-Charset: utf-8, iso-8859-1, utf-16, *;q=0.7
Cookie: VISITOR_INFO1_LIVE=lFsDZ5g6OfM; YSC=Ofbb0cz2kXU; PREF=fms1=10000&fms2=10000&f1=50000000&fv=0.0.0

What is bothering you here is that by setting the snaplength to 0 (-s 0) you are effectively setting it to the default of 65535. That's why you get all the content on your capture.
From man tcpdump:
-s
Snarf snaplen bytes of data from each packet rather than the default of 65535 bytes. Packets truncated because of a limited snapshot are indicated in the output with ``[|proto]'', where proto is the name of the protocol level at which the truncation has occurred. Note that taking larger snapshots both increases the amount of time it takes to process packets and, effectively, decreases the amount of packet buffering. This may cause packets to be lost. You should limit snaplen to the smallest number that will capture the protocol information you're interested in. Setting snaplen to 0 sets it to the default of 65535, for backwards compatibility with recent older versions of tcpdump.
Try lowering that value and you should get a neat line of output for each packet, easily parseable for IP addresses and ports with any regex function, be it php or whatever.
Edit: Forgot to say that you may want to try starting with a snaplength value of 96. I think that is the default... if it is, you may want to leave out the option alltogether. Then you can move up or down depending on how that works for you.

Why is kmeans so slow on high spec Ubuntu machine but not Windows?

My Ubuntu machine's performance is terrible for R kmeans {stats}, whereas Windows 7 shows no problems.
X is a 5000 x 5 matrix (numerical variables).
k = 6
My desktop machine is an Intel Xeon CPU W3530 # 2.80GHz x 8 (i.e., 8 cores) Dell Precision T3500, with Ubuntu 12.04.4 LTS (GNU/Linux 3.2.0-58-generic x86_64) with 24 GB RAM.
R version 3.0.2 (2013-09-25) -- "Frisbee Sailing" Copyright (C) 2013
The R Foundation for Statistical Computing Platform:
x86_64-pc-linux-gnu (64-bit)
> system.time(X.km <- kmeans(X, centers=k, nstart=25))
user system elapsed
49.763 52.347 103.426
Compared to a Windows 7 64-bit laptop with Intel Core i5-2430M # 2.40GHz, 2 cores, 8 GB RAM, R 3.0.1, and the same data:
> system.time(X.km <- kmeans(X, centers=k, nstart=25))
user system elapsed
0.36 0.00 0.37
Much, much faster. For nstart=1 the problem still exists, I just wanted to amplify the execution time.
Is there something obvious I'm missing?
Try it for yourselves, see what times you achieve:
set.seed(101)
k <- 6
n <- as.integer(10)
el.time <- vector(length=n)
X <- matrix(rnorm(25000, mean=0.5, sd=1), ncol=5)
for (i in 1:n) { # sorry, not clever enough to vectorise
el.time[i] <- system.time(kmeans(X, centers=k, nstart=i))[[3]]
}
print(el.time)
plot(el.time, type="b")
My results (ubuntu machine):
> print(el.time)
[1] 0.056 0.243 0.288 0.489 0.510 0.572 0.623 0.707 0.830 0.846
Windows machine:
> print(el.time)
[1] 0.01 0.12 0.14 0.19 0.20 0.21 0.22 0.25 0.28 0.30

Are you running Ubuntu in a Virtual Machine? If that were the case I could see the the results are much slower - Depending on how much memory, processors, diskspace was allocated for the VM. If it isn't running in a VM then the results are puzzling. I would want to see Performance counter for each of the runs (what is the cpu usage, memory usage, etc) on both systems when you run this? Otherwise, the only thing I could link of is that the code "fits" in the L1 cache of your windows system but doesn't in the Linux system. The Xeon has 8GB (L3?) Cache where the Core i5 only has 3MB - but I'm assuming that's L3. I don't know what the L1 and L2 cache structures look like.

My guess is it's a BLAS issue. R might use the internal BLAS if compiled like that. In addition to this, different BLAS versions can show significant performance differences (openBLAS <> MKL).

How to make server respond faster on First Byte?

My website has seen ever decreasing traffic, so I've been working to increase speed and usability. On WebPageTest.org I've worked most of my grades up but First Byte is still horrible.
F First Byte Time
A Keep-alive Enabled
A Compress Transfer
A Compress Images
A Progressive JPEGs
B Cache static
First Byte Time (back-end processing): 0/100
1081 ms First Byte Time
90 ms Target First Byte Time
I use the Rackspace Cloud Server system,
CentOS 6.4 2gig of Ram 80 gig harddrive,
Next Generation Server
Linux 2.6.32-358.18.1.el6.x86_64
Apache/2.2.15 (CentOS)
MySQL 5.1.69
PHP: 5.3.3 / Zend: 2.3.0
Website system Tomatocart Shopping Cart.
Any help would be much appreciated.
Traceroute #1 to 198.61.171.121
Hop Time (ms) IP Address FQDN
0.855 - 199.193.244.67
0.405 - 184.105.250.41 - gige-g2-14.core1.mci3.he.net
15.321 - 184.105.222.117 - 10gigabitethernet1-4.core1.chi1.he.net
12.737 - 206.223.119.14 - bbr1.ord1.rackspace.NET
14.198 - 184.106.126.144 - corea.ord1.rackspace.net
14.597 - 50.56.6.129 - corea-core5.ord1.rackspace.net
13.915 - 50.56.6.111 - core5-aggr1501a-1.ord1.rackspace.net
16.538 - 198.61.171.121 - mail.aboveallhousplans.com
#JXH's advise I did a packet capture and analyzed it using wireshark.
during a hit and leave visit to the site I got
6 lines of BAD TCP happening at about lines 28-33
warning that I have TCP Retransmission and TCP Dup ACK...
2 of each of these warnings 3 times.
Under the expanded panel viewing a
Retransmission/ TCP analysis flags - "retransmission suspected" "security level NOTE" RTO of 1.19 seconds.
Under the expanded panel viewing
DCP Dup ACK/ TCP analysis flags - Duplicate ACK" "security level NOTE" RTT of 0.09 seconds.
This is all gibberish to me...
I don't know if this is wise to do or not, but I've uploaded my packet capture dump file.
If anyone cares to take a look at my flags and let me know what they think.
I wonder if the retransmission warnings are saying that the HTTP file is sending duplicate information? I have a few things in twice that seems a little redundant. like user agent vary is duplicated.
# Set header information for proxies
Header append Vary User-Agent
# Set header information for proxies
Header append Vary User-Agent
Server fixed the retransmission and dup ack's a few days ago but lag in initial server response remains.
http://www.aboveallhouseplans.com/images/firstbyte001.jpg
http://www.aboveallhouseplans.com/images/firstbyte002.jpg
First byte of 600ms remains...

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string