Random String in linux by system time

Random String in linux by system time - linux

I work with Bash. I want to generate randrom string by system time . The length of the unique string must be between 10 and 30 characters.Can anybody help me?

There are many ways to do this, my favorite one using the urandom device:
burhan#sandbox:~$ tr -cd '[:alnum:]' < /dev/urandom | fold -w30 | head -n1
CCI4zgDQ0SoBfAp9k0XeuISJo9uJMt
tr (translate) makes sure that only alphanumerics are shown
fold will wrap it to 30 character width
head makes sure we get only the first line
To use the current system time (as you have this specific requirement):
burhan#sandbox:~$ date +%s | sha256sum | base64 | head -c30; echo
NDc0NGQxZDQ4MWNiNzBjY2EyNGFlOW
date +%s = this is our date based seed
We run it through a few hashes to get a "random" string
Finally we truncate it to 30 characters
Other ways (including the two I listed above) are available at this page and others if you simply google.

Maybe you can use uuidgen -t.
Generate a time-based UUID. This method creates a UUID based on the system clock plus the system's ethernet hardware address, if present.

I recently put together a script to handle this, the output is 33 digit md5 checksum but you can trim it down with sed to between 10-30.
E.g. gen_uniq_id.bsh | sed 's/\(.\{20\}\)\(.*$\)/\1/'
The script is fairly robust - it uses current time to nanoseconds, /dev/urandom, mouse movement data and allows for optionally changing the collection times for random and mouse data collection.
It also has a -s option that allows an additional string argument to be incorporated, so you can random seed from anything.
https://code.google.com/p/gen-uniq-id/

Related

How get unique lines from a very large file in linux?

I have a very large data file (255G; 3,192,563,934 lines). Unfortunately I only have 204G of free space on the device (and no other devices I can use). I did a random sample and found that in a given, say, 100K lines, there are about 10K unique lines... but the file isn't sorted.
Normally I would use, say:
pv myfile.data | sort | uniq > myfile.data.uniq
and just let it run for a day or so. That won't work in this case because I don't have enough space left on the device for the temporary files.
I was thinking I could use split, perhaps, and do a streaming uniq on maybe 500K lines at a time into a new file. Is there a way to do something like that?
I thought I might be able to do something like
tail -100000 myfile.data | sort | uniq >> myfile.uniq && trunc --magicstuff myfile.data
but I couldn't figure out a way to truncate the file properly.

Use sort -u instead of sort | uniq
This allows sort to discard duplicates earlier, and GNU coreutils is smart enough to take advantage of this.

Documentation for uinput

I am trying very hard to find the documentation for the uinput but the only thing I have found was the linux/uinput.h. I have also found some tutorials on the internet but no documentation at all!
For example I would like to know what UI_SET_MSCBIT does but I can't find anything about it.
How does people know how to use uinput?

Well, it takes some investigation effort for such subtle things. From
drivers/input/misc/uinput.c and include/uapi/linux/uinput.h files you can see bits for UI_SET_* definitions, like this:
MSC
REL
LED
etc.
Run next command in kernel sources directory:
$ git grep --all-match -e 'MSC' -e 'REL' -e 'LED' -- Documentation/*
or use regular grep, if your kernel doesn't have .git directory:
$ grep -rl MSC Documentation/* | xargs grep -l REL | xargs grep -l LED
You'll get this file: Documentation/input/event-codes.txt, from which you can see:
EV_MSC: Used to describe miscellaneous input data that do not fit into other types.
EV_MSC events are used for input and output events that do not fall under other categories.
A few EV_MSC codes have special meaning:
MSC_TIMESTAMP: Used to report the number of microseconds since the last reset. This event should be coded as an uint32 value, which is allowed to wrap around with no special consequence. It is assumed that the time difference between two consecutive events is reliable on a reasonable time scale (hours). A reset to zero can happen, in which case the time since the last event is unknown. If the device does not provide this information, the driver must not provide it to user space.
I'm afraid this is the best you can find out there for UI_SET_MSCBIT out there.

Is it possible to display the progress of a sort in linux?

My job involves a lot of sorting fields from very large files. I usually do this with the sort command in bash. Unfortunately, when I start a sort I am never really sure how long it is going to take. Should I wait a second for the results to appear, or should I start working on something else while it runs?
Is there any possible way to get an idea of how far along a sort has progressed or how fast it is working?
$ cut -d , -f 3 VERY_BIG_FILE | sort -du > output

No, GNU sort does not do progress reporting.
However, if are you using sort just to remove duplicates, and you don't actually care about the ordering, then there's a more scalable way of doing that:
awk '! a[$0]++'
This writes out the first occurrence of a line as soon as it's been seen, which can give you an idea of the progress.

You might want to give pv a try, it should give you a pretty good idea of what is going on in your pipe in terms of throughput.
Example (untested) injecting pv before and after the sort command to get an idea of the throughput:
$ cut -d , -f 3 VERY_BIG_FILE | pv -cN cut | sort -du | pv -cN sort > output
EDIT: I missed the -u in your sort command, so calculating lines first to be able to get a percentage output is void. Removed that part from my answer.

You can execute your "sort" in background
you will get prompt and you can do other jobs
$sort ...... & # (& means run in background )

grep limited characters - one line

I want to look up a word in multiple files, and return only a single line per result, or a limited number of characters (40 ~ 80 characters for example), and not the entire line, as by default.
grep -sR 'wp-content' .
file_1.sql:3309:blog/wp-content
file_1.sql:3509:blog/wp-content
file_2.sql:309:blog/wp-content
Currently I see the following:
grep -sR 'wp-content' .
file_1.sql:3309:blog/wp-content-Progressively predominate impactful systems without resource-leveling best practices. Uniquely maximize virtual channels and inexpensive results. Uniquely procrastinate multifunctional leadership skills without visionary systems. Continually redefine prospective deliverables without.
file_1.sql:3509:blog/wp-content-Progressively predominate impactful systems without resource-leveling best practices. Uniquely maximize virtual channels and inexpensive results. Uniquely procrastinate multifunctional leadership skills without visionary systems. Continually redefine prospective deliverables without.
file_2.sql:309:blog/wp-content-Progressively predominate impactful systems without resource-leveling best practices. Uniquely maximize virtual channels and inexpensive results. Uniquely procrastinate multifunctional leadership skills without visionary systems. Continually redefine prospective deliverables without.

You could use a combination of grep and cut
Using your example I would use:
grep -sRn 'wp-content' .|cut -c -40
grep -sRn 'wp-content' .|cut -c -80
That would give you the first 40 or 80 characters respectively.
edit:
Also, theres a flag in grep, that you could use:
-m NUM, --max-count=NUM
Stop reading a file after NUM matching lines.
This with a combination of what I previously wrote:
grep -sRnm 1 'wp-content' .|cut -c -40
grep -sRnm 1 'wp-content' .|cut -c -80
That should give you the first time it appears per file, and only the first 40 or 80 chars.

egrep -Rso '.{0,40}wp-content.{0,40}' *.sh
This will not call the Radio-Symphonie-Orchestra, but -o(nly matching).
A maximum of 40 characters before and behind your pattern. Note: *e*grep.

If you change the regex to '^.*wp-content' you can use egrep -o. For example,
egrep -sRo '^.*wp-content' .
The -o flag make egrep only print out the portion of the line that matches. So matching from the start of line to wp-content should yield the sample output in your first code block.

How can a the extension of the PCR value be replicated with e.g. sha1sum?

this is somewhat related to the post in:
Perform OR on two hash outputs of sha1sum
I have a sample set of TPM measurements, e.g. the following:
10 1ca03ef9cca98b0a04e5b01dabe1ff825ff0280a ima 0ea26e75253dc2fda7e4210980537d035e2fb9f8 boot_aggregate
10 7f36b991f8ae94141753bcb2cf78936476d82f1d ima d0eee5a3d35f0a6912b5c6e51d00a360e859a668 /init
10 8bc0209c604fd4d3b54b6089eac786a4e0cb1fbf ima cc57839b8e5c4c58612daaf6fff48abd4bac1bd7 /init
10 d30b96ced261df085c800968fe34abe5fa0e3f4d ima 1712b5017baec2d24c8165dfc1b98168cdf6aa25 ld-linux-x86-64.so.2
According to the TPM spec, also referred to in the above post, the PCR extend operation is: PCR := SHA1(PCR || data), i.e. "concatenate the old value of PCR with the data, hash the concatenated string and store the hash in PCR". Also, the spec multiple papers and presentations I have found mention that data is a hash of the software to be loaded.
However, when I do an operation like echo H(PCR)||H(data) | sha1sum, I do not obtain a correct resulting value. I.e., when calculatinng (using the above hashes): echo 1ca03ef9cca98b0a04e5b01dabe1ff825ff0280a0ea26e75253dc2fda7e4210980537d035e2fb9f8 | sha1sum, the resuting value is NOT 7f36b991f8ae94141753bcb2cf78936476d82f1d.
Is my understanding of the TPM_Extend operation correct? if so, why is the resulting hash different from the one in the sample measurement file?
Thanks!
/n

To answer your very first question: Your understanding of extend operation is more or less correct. But you have 2 problems:
You are misinterpreting the things you have copied in here
You can't calculate hashes like you do on the shell
The log output you provided here is from Linux's IMA. According to the
documentation the first hash is template-hash and defined as
template-hash: SHA1(filedata-hash | filename-hint)
filedata-hash: SHA1(filedata)
So for the first line: SHA1(0ea26e75253dc2fda7e4210980537d035e2fb9f8 | "boot_aggregate")
results in 1ca03ef9cca98b0a04e5b01dabe1ff825ff0280a.
Note that the filename-hint is 256 byte long - it is 0-padded at the end.
(thumbs up for digging this out of the kernel source ;))
So to make it clear: In your log are no PCR values.
I wrote something in Ruby to verify my findings:
require 'digest/sha1'
filedata_hash = ["0ea26e75253dc2fda7e4210980537d035e2fb9f8"].pack('H*')
filename_hint = "boot_aggregate".ljust(256, "\x00")
puts Digest::SHA1.hexdigest(filedata_hash + filename_hint)
Now to your commands:
The way you are using it here, you are interpreting the hashes as ASCII-strings.
Also note that echo will add an additional new line character to the output.
The character sequence 1ca03ef9cca98b0a04e5b01dabe1ff825ff0280a is hexadecimal
encoding of 160 bit binary data - a SHA1 hash value. So basically you are right,
you have to concatenate the two values and calculate the SHA1 of the resulting
320 bit of data.
So the correct command for the command line would be something like
printf "\x1c\xa0\x3e\xf9\xcc\xa9\x8b\x0a\x04\xe5\xb0\x1d\xab\xe1\xff\x82\x5f\xf0\x28\x0a\x0e\xa2\x6e\x75\x25\x3d\xc2\xfd\xa7\xe4\x21\x09\x80\x53\x7d\x03\x5e\x2f\xb9\xf8" | sha1sum
The \xXX in the printf string will convert the hex code XX into one byte of
binary output.
This will result in the output of d14f958b2804cc930f2f5226494bd60ee5174cfa,
and that's fine.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string