csv data for spilting in different files - linux

25422572,2018-04-01,00:00:27,e7961e25-5f46-4c81-b85d-36ce404bf72e,891672
25422631,2018-04-01,00:01:21,41afad62-c037-4bed-9568-f76f3a86eb10,891672,
I have a data like shown above, up to 2018-04-30 .I want to split it down in 3 files for 10 days each. How can I do this through Linux commands ?

You could use grep:
grep -E "2018-04-(0[1-9]|10)" input.txt > out1.txt
grep -E "2018-04-(1[1-9]|20)" input.txt > out2.txt
grep -E "2018-04-(2[1-9]|30)" input.txt > out3.txt

Related

Concatenation of huge number of selective files from a directory in Shell

I have more than 50000 files in a directory such as file1.txt, file2.txt, ....., file50000.txt. I would like to concatenate of some files whose file numbers are listed in the following text file (need.txt).
need.txt
1
4
35
45
71
.
.
.
I tried with the following. Though it is working, but I look for more simpler and short way.
n1=1
n2=$(wc -l < need.txt)
while [ $n1 -le $n2 ]
do
f1=$(awk 'NR=="$n1" {print $1}' need.txt)
cat file$f1.txt >> out.txt
(( n1++ ))
done
This might also work for you:
sed 's/.*/file&.txt/' < need.txt | xargs cat > out.txt
Something like this should work for you:
sed -e 's/.*/file&.txt/' need.txt | xargs cat > out.txt
It uses sed to translate each line into the appropriate file name and then hands the filenames to xargs to hand them to cat.
Using awk it could be done this way:
awk 'NR==FNR{ARGV[ARGC]="file"$1".txt"; ARGC++; next} {print}' need.txt > out.txt
Which adds each file to the ARGV array of files to process and then prints every line it sees.
It is possible do it without any sed or awk command. Directly using bash built-in functions and cat (of course).
for i in $(cat need.txt); do cat file${i}.txt >> out.txt; done
And as you want, it is quite simple.

How to find which line is missing in another file

on Linux box
I have one file as below
A.txt
1
2
3
4
Second file as below
B.txt
1
2
3
6
I want to know what is inside A.txt but not in B.txt
i.e. it should print value 4
I want to do that on Linux.
awk 'NR==FNR{a[$0]=1;next}!a[$0]' B A
didn't test, give it a try
Use comm if the files are sorted as your sample input shows:
$ comm -23 A.txt B.txt
4
If the files are unsorted, see #Kent's awk solution.
You can also do this using grep by combining the -v (show non-matching lines), -x (match whole lines) and -f (read patterns from file) options:
$ grep -v -x -f B.txt A.txt
4
This does not depend on the order of the files - it will remove any lines from A that match a line in B.
(An addition to #rjmunro's answer)
The proper way to use grep for this is:
$ grep -F -v -x -f B.txt A.txt
4
Without the -F flag, grep interprets PATTERN, read from B.txt, as a basic regular expression (BRE), which is undesired here, and can cause troubles. -F flag makes grep treat PATTERN as a set of newline-separated strings. For instance:
$ cat A.txt
&
^
[
]
$ cat B.txt
[
^
]
|
$ grep -v -x -f B.txt A.txt
grep: B.txt:1: Invalid regular expression
$ grep -F -v -x -f B.txt A.txt
&
Using diff:
diff --changed-group-format='%<' --unchanged-group-format='' A.txt B.txt

Extract strings in a text file using grep

I have file.txt with names one per line as shown below:
ABCB8
ABCC12
ABCC3
ABCC4
AHR
ALDH4A1
ALDH5A1
....
I want to grep each of these from an input.txt file.
Manually i do this one at a time as
grep "ABCB8" input.txt > output.txt
Could someone help to automatically grep all the strings in file.txt from input.txt and write it to output.txt.
You can use the -f flag as described in Bash, Linux, Need to remove lines from one file based on matching content from another file
grep -o -f file.txt input.txt > output.txt
Flag
-f FILE, --file=FILE:
Obtain patterns from FILE, one per line. The empty file
contains zero patterns, and therefore matches nothing. (-f is
specified by POSIX.)
-o, --only-matching:
Print only the matched (non-empty) parts of a matching line, with
each such part on a separate output line.
for line in `cat text.txt`; do grep $line input.txt >> output.txt; done
Contents of text.txt:
ABCB8
ABCC12
ABCC3
ABCC4
AHR
ALDH4A1
ALDH5A1
Edit:
A safer solution with while read:
cat text.txt | while read line; do grep "$line" input.txt >> output.txt; done
Edit 2:
Sample text.txt:
ABCB8
ABCB8XY
ABCC12
Sample input.txt:
You were hired to do a job; we expect you to do it.
You were hired because ABCB8 you kick ass;
we expect you to kick ass.
ABCB8XY You were hired because you can commit to a rational deadline and meet it;
ABCC12 we'll expect you to do that too.
You're not someone who needs a middle manager tracking your mouse clicks
If You don't care about the order of lines, the quick workaround would be to pipe the solution through a sort | uniq:
cat text.txt | while read line; do grep "$line" input.txt >> output.txt; done; cat output.txt | sort | uniq > output2.txt
The result is then in output.txt.
Edit 3:
cat text.txt | while read line; do grep "\<${line}\>" input.txt >> output.txt; done
Is that fine?

Linux cut string

In Linux (Cento OS) I have a file that contains a set of additional information that I want to removed. I want to generate a new file with all characters until to the first |.
The file has the following information:
ALFA12345|7890
Beta0-XPTO-2|30452|90 385|29
ZETA2334423 435; 2|2|90dd5|dddd29|dqe3
The output expected will be:
ALFA12345
Beta0 XPTO-2
ZETA2334423 435; 2
That is removed all characters after the character | (inclusive).
Any suggestion for a script that reads File1 and generates File2 with this specific requirement?
Try
cut -d'|' -f1 oldfile > newfile
And, to round out the "big 3", here's the awk version:
awk -F\| '{print $1}' in.dat
You can use a simple sed script.
sed 's/^\([^|]*\).*/\1/g' in.dat
ALFA12345
Beta0-XPTO-2
ZETA2334423 435; 2
Redirect to a file to capture the output.
sed 's/^\([^|]*\).*/\1/g' in.dat > out.dat
And with grep:
$ grep -o '^[^|]*' file1
ALFA12345
Beta0-XPTO-2
ZETA2334423 435; 2
$ grep -o '^[^|]*' file1 > file2

select multiple lines using the linux command sed

I have an example [file] that I want to Grab lines 3-6 and lines 11 - 13 then sort with a one line command and save it as 3_6-11_13. These are the commands I have used thus far but I haven't gotten the desired output:
sed -n '/3/,/6/p'/11/,/13/p file_1 > file_2 | sort -k 2 > file_2 & sed -n 3,6,11,13p file_1 > file_2 | sort -k 2 file_2.
Is there a better way to shorten this. I have thought about using awk but have I stayed with sed so far.
With sed you're allowed to specify addresses by number like so:
sed -n '3,6p'
The -n is to keep sed from automatically printing output.
Then you can run multiple commands if you're using gsed by separating those commands with semicolons:
sed -n '3,6p; 11,13p' | sort -k2 > 3_6-11_13
sed combine multiple commands using -e option
$ sed -e 'comm' -e 'comm' file.txt
or you can separate commands using the semicolon
$ sed 'comm;comm;comm' file.txt

Resources