How could I use "tr" to translate every byte? - linux

My goal is to have every byte from an input file xor with 42 using tr.
But I was stuck at this point:
tr '\0-\377' '?'
Anyone can help? Thanks a lot..
Some restrictions:
translation has to be done by tr.
we are allowed to use bash script, but it shouldn’t use any temporary files. (only pipeline in other words)

This isn't possible with tr alone since it - as the name says - simply translates from one codeset to another. Math calculations or logical operations are not supported.
Btw, if you want to address the whole ascii range using numeric values, you are bound to octal numbers. The range would be \0-\377 in that case. But anyway, calculating the xor value is not possible.
What you can do is the prepare a table with xor^42'ed ascii values and use as SET2. I'm using python to create that list:
xor.py
v = []
for i in range(0,255):
v.append("\\" + oct(i^42))
print("".join(v))
Or simply:
print("".join(["\\" + oct(i^42) for i in range(0,255)]))
Then use that to create SET2 for tr:
tr '\0-\0377' "$(python xor.py)" < input.file
Note: When python is already required, why not using python for the whole solution?
Edited by Mark Setchell
As Peter pointed out this list can of course getting generated in advance and then getting passed as a string for SET2. This would not require a runtime python call. Like this:
tr '\0-\0377' '\052\053\050\051\056\057\054\055\042\043\040\041\046\047\044\045\072\073\070\071\076\077\074\075\062\063\060\061\066\067\064\065\012\013\010\011\016\017\014\015\02\03\0\01\06\07\04\05\032\033\030\031\036\037\034\035\022\023\020\021\026\027\024\025\0152\0153\0150\0151\0156\0157\0154\0155\0142\0143\0140\0141\0146\0147\0144\0145\0172\0173\0170\0171\0176\0177\0174\0175\0162\0163\0160\0161\0166\0167\0164\0165\0112\0113\0110\0111\0116\0117\0114\0115\0102\0103\0100\0101\0106\0107\0104\0105\0132\0133\0130\0131\0136\0137\0134\0135\0122\0123\0120\0121\0126\0127\0124\0125\0252\0253\0250\0251\0256\0257\0254\0255\0242\0243\0240\0241\0246\0247\0244\0245\0272\0273\0270\0271\0276\0277\0274\0275\0262\0263\0260\0261\0266\0267\0264\0265\0212\0213\0210\0211\0216\0217\0214\0215\0202\0203\0200\0201\0206\0207\0204\0205\0232\0233\0230\0231\0236\0237\0234\0235\0222\0223\0220\0221\0226\0227\0224\0225\0352\0353\0350\0351\0356\0357\0354\0355\0342\0343\0340\0341\0346\0347\0344\0345\0372\0373\0370\0371\0376\0377\0374\0375\0362\0363\0360\0361\0366\0367\0364\0365\0312\0313\0310\0311\0316\0317\0314\0315\0302\0303\0300\0301\0306\0307\0304\0305\0332\0333\0330\0331\0336\0337\0334\0335\0322\0323\0320\0321\0326\0327\0324' < inputFile > outputFile

Related

Looking for the best way in bash shell to extract a string

I have the following string being exported from a program that is analyzing the certificate on a website which will be part of a bugfix analysis
CERT_SUMMARY:127.0.0.1:127.0.0.1:631:sha256WithRSAEncryption:
/O=bfcentos7-test/CN=bfcentos7-test/emailAddress=root$bfcentos7-
test:/O=bfcentos7-test/CN=bfcentos7-test/emailAddress=root$bfcentos7-
test:170902005715Z:270831005715Z:self signed certificate
(consider output above to be a single line)
What I need is the best way in a bash shell to extract the sha256WithRSAEncryption. This could be anything like sha384withRSAEncryption or something else.
After the CERTSUMMARY it will always be 127.0.0.1:127.0.0.1:portnum above its port 631, but it could be anything.
This runs internally on a system and returns this string along with SSL or TLS (not pictured)
Here is another example of a return
CERT_SUMMARY:127.0.0.1:127.0.0.1:52311:sha256WithRSAEncryption:
/CN=ServerSigningCertificate_0/name=Type`Administrator
/name=DBName`ServerSigningCertificate_0:/C=US/CN=BLAHBLAH/
ST=California/L=Address, Emeryville CA 94608/O=IBM BigFix Evaluation
License/OU=Customer/emailAddress=blahblay#gmail.com/name=
Hash`sha1/name=Server`bigfix01/name=CustomActions`Enable
/name=LicenseAllocation`999999/name=CustomRetrievedProperties`Enable:
170702212459Z:270630212459Z:unable to get local issuer certificate
Thanks in advance.
Novice at shell programming, but learning!!
you need the best way and yet do not seem to provide the best description - "This could be anything like sha384withRSAEncryption or something else."
Given the examples, the string you are looking for is the 4th, when : is a separator, so the command should be OK:
cut -f4 -d":"
If the output string has a strict length format, one easy option is the 'cut' command with -c. This is not the case though since there is a port number.
CERT_SUMMARY:127.0.0.1:127.0.0.1:631:sha256WithRSAEncryption:
as #cyrus pointed out, this was as simple as picking the right column with awk... I am learning.
This worked
awk -F ":" '/CERT_SUMMARY/ {print $5}'
Thanks for the help!!
| sed -E 's/^([^:]*:){4}([^:]*):.*/\2/'
Regular expressions are you friend. If there is one thing one really should be familiar with if one needs to do a lot of string parsing or string processing, it's definitely regular expressions.
echo 'CERT_SUMMARY:127.0.0.1:127.0.0.1:52311:sha256WithRSAEncryption:
/CN=ServerSigningCertificate_0/name=Type`Administrator
/name=DBName`ServerSigningCertificate_0:/C=US/CN=BLAHBLAH/ST=California
/L=Address, Emeryville CA 94608/O=IBM BigFix Evaluation
License/OU=Customer/emailAddress=blahblay#gmail.com/name=Hash`sha1
/name=Server`bigfix01/name=CustomActions`Enable
/name=LicenseAllocation`999999
/name=CustomRetrievedProperties
`Enable:170702212459Z:270630212459Z:unable to get local issuer
certificate'
| sed -E 's/^([^:]*:){4}([^:]*):.*/\2/'
prints
sha256WithRSAEncryption
It's probably a bit overkill here, but there is almost nothing that cannot be done with regular expressions and as you have also built-in regex support in many languages today, knowing regex is never going to be a waste of time.
See also here to get a nice explanation of what each regex expression actually means, including an interactive editing view. Basically I'm telling the regex parser to skip the first 4 groups consisting of any number of characters that are not :, followed by a single : and then capture the 5th group that consists of any number of characters that are not : and finally match anything else (no matter what) to the end of the string. The whole regex is part of a sed "replace" operation, where I replace the whole string by just the content that has been captured by the second capture group (everything in round parenthesis is a capture group).
Could you please use following also, not printing it by field's number so if your Input_file's sha256 location is a bit here and there too than shown one then this could be more helpful too.
awk '{match($0,/sha.*Encryption:/);if(substr($0,RSTART,RLENGTH)){print substr($0,RSTART,RLENGTH-1)}}' Input_file
Pipe the output to:
awk ‘BEGIN{FS=“:”} {print $5}’
You could also take a step back to the openssl x509 command 'name options'. Using sep_comma_plus avoids the slashes in the output and therefore your regex will be simpler.

substitue string by index without using regular expressions

It should be very easy, but I am looking for an efficient way to perform it.
I know that I could split the string into two parts and insert the new value, but I have tried to substitute each line between the indexes 22-26 as follows:
line.replace(line[22:26],new_value)
The Problem
However, that function substitutes everything in the line that is similar to the pattern in line[22:26].
In the example below, I want to replace the marked number 1 with number 17:
Here are the results. Note the replacement of 1 with 17 in several places:
Thus I don't understand the behavior of replace command. Is there a simple explanation of what I'm doing wrong?
Why I don't want RE
The values between index 22-26 are not unified in form.
Note: I am using python 3.5 on Unix/Linux machines.
str.replace replaces 1 sub-string pattern with another everywhere in the string.
e.g.
'ab cd ab ab'.replace('ab', 'xy')
# produces output 'xy cd xy xy'
similarly,
mystr = 'ab cd ab ab'
mystr.replace(mystr[0:2], 'xy')
# also produces output 'xy cd xy xy'
what you could do instead, to replace just the characters in position 22-26
line = line[0:22] + new_value + line[26:]
Also, looking at your data, it seems to me to be a fixed-width text file. While my suggestion will work, a more robust way to process this data would be to read it & separate the different fields in the record first, before processing the data.
If you have access to the pandas library, it provides a useful function just for reading fixed-width files

Finding substring of variable length in bash

I have a string, such as time=1234, and I want to extract just the number after the = sign. However, this number could be in the range of 0 and 100000 (eg. - time=1, time=23, time=99999, etc.).
I've tried things like $(string:5:8}, but this will only work for examples of a certain length.
How do I get the substring of everything after the = sign? I would prefer to do it without outside commands like cut or awk, because I will be running this script on devices that may or may not have that functionality. I know there are examples out there using outside functions, but I am trying to find a solution without the use of such.
s=time=1234
time_int=${s##*=}
echo "The content after the = in $s is $time_int"
This is a parameter expansion matching everything matching *= from the front of the variable -- thus, everything up to and including the last =.
If intending this to be non-greedy (that is, to remove only content up to the first = rather than the last =), use ${s#*=} -- a single # rather than two.
References:
The bash-hackers page on parameter expansion
BashFAQ #100 ("How do I do string manipulations in bash?")
BashFAQ #73 ("How can I use parameter expansion? How can I get substrings? [...])
BashSheet quick-reference, paramater expansion section
if time= part is constant you can remove prefix by using ${str#time=}
Let's say you have str='time=123123' if you execute echo ${str#time=} you would get 123123

AWK Numeric Variable treated as string

[Ubuntu 14.04, GNU Awk 4.0.1]
I have a strange problem... I am assigning a numeric value, that is retrieved from an input file, to a custom variable. When I print it, it displays correctly, and printing its length displays the right number of digits.
However, when I use the variable in a loop, my loop stops when index becomes greater than the most significant digit of my variable.
I have tried a For Loop, and now a While Loop, both suffer the same problem.
With the file I'm processing, samples contains the value 8092, and the loop stops on the 9th iteration.
#!/usr/bin/awk -f
BEGIN {
samples = 0;
}
{
...
samples = $24;
}
END {
i = 1;
while (i <= samples ) {
if (i>samples) { print "This is the end.\n " i " is bigger than " samples;}
i++;
}
}
I am very new to AWK, and can't see why this is occurring. After reading a number of tutorials, I'm under the impression that AWK is able to convert between string & numeric representations of numbers as required.
Can someone help me see what I've done wrong?
Solution
The answer was, as JNevill & ghoti suggested, to add 0 to the variable. In my case, the best place was just before the loop, as samples` is rewritten during the body of the AWK script. Thanks.
Awk doesn't exactly "convert" between representations, it simply uses whatever you give it, adjusting context based on usage. Thus, when evaluating booleans, any non-zero number evaluates to TRUE, and any string except "0" evaluates to TRUE.
I can't see what's really in your samples variable, but if you want to force things to be evaluated as a number before you start your loop, you might be able to simple add zero to the variable I.e.:
samples = $24 + 0;
Also, if your source data came from a DOS/Windows machine and has line endings that include carriage returns (\r\n), and $24 is the last field on each line, then you may be comparing i against 24\r, which is likely not to give you the results you expect.
To see what's really in your input data, try:
cat -vet samples | less
If you see ^M before the $ at the end of each line, then your input file contains carriage returns, and you should process it appropriately before asking awk to parse its content.
In fact, I think it's pretty clear that since your input data begins with the character "8" and your loop stops on the 9th iteration, your comparison of i to samples is one of strings rather than numbers.
awk decides the type of variable depending on what value is held in the variable. You can force it to type the way you want, though it's a bit hackey (isn't everything though).
Try adding 0 to your variable before hitting the for loop. $sample = $sample + 0, for instance. Now no matter what awk thought before you hit that line, it will now treat your number as a number and your for loop should execute as expected.
Odd though that it was executing at all and stopping at 9 iterations.... It suggests that perhaps it is already treating it correctly and you may be assuming that the value is 8092, when it is, in fact 9. Also, that printed bit inside your for loop should never execute. Hopefully it doesn't output that.

Using sed to drop strings with repeated and incremental characters?

I'm trying to use sed to drop strings containing repeated characters before appending them to a file.
So far I have this, to drop stings with consecutive repetition like 'AA' or '22', but I'm struggling with full string repetition and incremental characters.
generic string generator | sed '/\([^A-Za-z0-9_]\|[A-Za-z0-9]\)\1\{1,\}/d' >> parsed sting to file
I also want to drop strings contain any repetition like 'ABA'.
As well as, strings containing any ascending or descending characters like 'AEF' or 'AFE'.
I'm assuming it would be easier to use multiple passes of sed to drop the unwanted strings.
** A little more information to try to avoid the XY problem mentioned. **
The character strings could be from 8 to 64 in length, but in this instance I'm focusing on 8. While at the same time I've restricted the string generation to only output an upper-case alpha string (A-Z). This is for a few reasons, but mainly that I don't want the generated file to have a ridiculously huge footprint.
With the first pass of sed dropping unnecessary outputs like 'AAAAAAAA' and 'AAAAAAAB' from the stream. This results in the file starting with strings 'ABABABAB' and 'ABABABAC'.
Next pass I want to check that from one character to the next doesn't increase or decrease by a value of one. So strings like 'ABABABAB' would be dropped, but 'ACACACAC' would parse to the stream.
Next pass I want to drop strings that contain any repeated characters in the whole string. So strings like 'ACACACAC' would be dropped, but 'ACEBDFHJ' would parse to the file.
Hope that helps.
In order to do what you're describing with sed, you'd need to run it many times. Since sed doesn't understand the concept of "this character is incremental from this other character", you need to run it across all possible combinations:
sed '/AB/d'
sed '/BC/d'
sed '/CD/d'
sed '/DE/d'
etc.
For descending characters, the same thing:
sed '/BA/d'
sed '/CB/d'
In order to then drop strings with repeated characters, you can do something like this:
sed '/\(.\).*\1/d'
The following should do the trick:
generic string generator |sed '/\(.\).*\1/d'|sed /BA/d|sed /AB/d||sed /CB/d|sed /BC/d|sed /DC/d|sed /CD/d|sed /ED/d|sed /DE/d|sed /FE/d|sed /EF/d|sed /GF/d|sed /FG/d|sed /HG/d|sed /GH/d|sed /IH/d|sed /HI/d|sed /JI/d|sed /IJ/d|sed /KJ/d|sed /JK/d|sed /LK/d|sed /KL/d|sed /ML/d|sed /LM/d|sed /NM/d|sed /MN/d|sed /ON/d|sed /NO/d|sed /PO/d|sed /OP/d|sed /QP/d|sed /PQ/d|sed /RQ/d|sed /QR/d|sed /SR/d|sed /RS/d|sed /TS/d|sed /ST/d|sed /UT/d|sed /TU/d|sed /VU/d|sed /UV/d|sed /WV/d|sed /VW/d|sed /XW/d|sed /WX/d|sed /YX/d|sed /XY/d|sed /ZY/d|sed /YZ/d
I only tested this on a few input samples, but they all seemed to work.
Note that this is quite ungainly, and would be better done by something a little more sophisticated than sed. Here's a sample in python:
import math
def isvalid(x):
if set(len(x)) < len(x):
return False
for a in range(1, len(x)):
if math.fabs(ord(x[a])-ord(x[a-1])) == 1:
return False
return True
This is much more readable than the giant set of sed calls, and has the same functionality.

Resources