replacing column value based in a line containing a specific string - text

I have text file :
file_11199_name 45 69
file_11176_name 45 69
file_11156_name 45 69
where i want change the value of column three to 1 when the first column has "11199" in the string.

the next, 3-lines, of AWK code seems to do what you need:
{c=$3}
$1~11199{c=1}
{ print $1,$2,c }
line 1 assigns a variable (c) with the value of the third column.
line 2 assigns the value 1 to c if the first variable contains 11199 ($1~11199)
line 3 prints the output
$ awk '{c=$3}$1~11199{c=1}{ print $1,$2,c }' file
file_11199_name 45 1
file_11176_name 45 69
file_11156_name 45 69

Related

Extract columns from multiple text files with bash or awk or sed?

I am trying to extract column1 and column4 from multiple text files.
file1.txt:
#rname startpos endpos numreads covbases coverage meandepth meanbaseq meanmapq
CFLAU10s46802|kraken:taxid|33189 1 125 2 105 84 1.68 36.8 24
CFLAU10s46898|kraken:taxid|33189 1 116 32 116 100 23.5862 35.7 19.4
CFLAU10s46988|kraken:taxid|33189 1 105 2 53 50.4762 1.00952 36.9 11
AUZW01004514.1 Cronartium comandrae C4 contig1015102_0, whole genome shotgun sequence 1 1102 2 88 7.98548 0.15971 36.4 10
AUZW01004739.1 Cronartium comandrae C4 contig1070682_0, whole genome shotgun sequence 1 2133 6 113 5.2977 0.186592 36.6 13
file2.txt:
#rname startpos endpos numreads covbases coverage meandepth meanbaseq meanmapq
CFLAU10s46802|kraken:taxid|33189 1 125 5 105 84 1.68 36.8 24
CFLAU10s46898|kraken:taxid|33189 1 116 40 116 100 23.5862 35.7 19.4
CFLAU10s46988|kraken:taxid|33189 1 105 6 53 50.4762 1.00952 36.9 11
AUZW01004514.1 Cronartium comandrae C4 contig1015102_0, whole genome shotgun sequence 1 1102 2 88 7.98548 0.15971 36.4 10
AUZW01004739.1 Cronartium comandrae C4 contig1070682_0, whole genome shotgun sequence 1 2133 6 113 5.2977 0.186592 36.6 13
output format (save the output as merged.txt in another directory). In the output file: Column1(#nname) will be once because this is same for every file, but there will be multiple column4 (numreads) as many as files and the rename the column4 should be according to each file name.
Output file looks like:
#rname file1_numreads file2_numreads
CFLAU10s46802|kraken:taxid|33189 2 5
CFLAU10s46898|kraken:taxid|33189 32 40
CFLAU10s46988|kraken:taxid|33189 2 6
AUZW01004514.1 Cronartium comandrae C4 contig1015102_0, whole genome shotgun sequence 2 88
AUZW01004739.1 Cronartium comandrae C4 contig1070682_0, whole genome shotgun sequence 6 113
Your suggestions would be appreciated.
Here is something I put together. awk gurus might have a simpler - shorter version but I am still learning awk.
Create a file script.awk and make it executable. Put in it:
#!/usr/bin/awk -f
BEGIN { FS="\t" }
# process files, ignoring comments
!/^#/ {
# keep the first column values.
# Only add a new value if it is not already in the array.
if (!($1 in firstcolumns)) {
firstcolumns[$1] = $1
}
# extract the 4th column of file1, put it in the array (column 1).1
if (FILENAME == ARGV[1]) {
results[$1 ".1"] = $4
}
# extract the 4th column of file2, put it in the array (column 1).2
if (FILENAME == ARGV[2]) {
results[$1 ".2"] = $4
}
}
# print the results
END {
# for each first column value...
for (key in firstcolumns) {
# Print the first column, then (column 1).1, then (column 1).2
print key "\t" results[key ".1"] "\t" results[key ".2"]
}
}
Call it like this: ./script.awk file1.txt file2.txt.
Since awk parses the files line per line, I keep the possible values of the first column in an array (firstcolumns).
For each line, if the 4th column comes from file1.txt (ARGV[1]) I store it in the results array under (firstcolumn).1.
For each line, if the 4th column comes from file2.txt (ARGV[2]) I store it in the results array under (firstcolumn).2.
In the END block, loop through the possible firstcolumn values and print the values (firstcolumn).1 and (firstcolumn).2, separated by "\t" for tabs.
Results:
$ ./so.awk file1.txt file2.txt
AUZW01004514.1 C4 C4
CFLAU10s46988|kraken:taxid|33189 2 6
CFLAU10s46802|kraken:taxid|33189 2 5
AUZW01004739.1 C4 C4
CFLAU10s46898|kraken:taxid|33189 32 40

Add a leter to the 46th word in every line in a text file using linux

I have a text file with 500k rows. In each line there are 47 words.
I want to add the letter 'C' to the 46th word in every line except the first line using linux (the first line is the table title).
For example:
ID FID IID ..... number_center age
1 1001 807 ..... 10960 47
3 900 818 ..... 10877 51
The output: new text file
ID FID IID ..... number_center age
1 1001 807 ..... C10960 47
3 900 818 ..... C10877 51
I tried to find the answer but did not find one.
Thank you
Using awk, set the 46th space delimited field (46th word) to "C" plus the 46th field. Print the line with shorthand 1. Ignore the first line with NR>1.
awk 'NR>1 { $46="C"$46 }1' filename

Datamash: Transposing the column into rows based on group in bash

I have a tab delim file with a 2 columns like following
A 123
A 23
A 45
A 67
B 88
B 72
B 50
B 23
C 12
C 14
I want to transpose with the above data based on the first column like following
A 123 23 45 67
B 88 72 50 23
C 12 14
I tried the datamash transpose < input-file.txt but it didnt yield the output as expected.
One awk version:
awk '{printf ($1!=f?"\n%s":" "$2),$0;f=$1}' file
A 123 23 45 67
B 88 72 50 23
C 12 14
With this version, you get on blank line, but should be fast and handle large data since no loop or array variable are used.
$1!=f?"\n%s":" "$2),$0 If first field is not equal f, print new line and all fields
if $1 = f, only print field 2.
f=$1 set f to first field
datamash --group=1 --field-separator=' ' collapse 2 <file | tr ',' ' '
Output:
A 123 23 45 67
B 88 72 50 23
C 12 14
Input must be sorted, as in the question.
This might work for you (GNU sed):
sed -E ':a;N;s/^((\S+)\s+.*)\n\2/\1/;ta;P;D' file
Append the next line and if the first field of the first line is the same as the first field of the second line, remove the newline and the first field of the second line. Print the first line in the pattern space and then delete it and the following newline and repeat.

awk split adds whole string to array position 1 (reason unknown)

So I have a .txt file that looks like this:
mona 70 77 85 77
john 85 92 78 80
andreja 89 90 85 94
jasper 84 64 81 66
george 54 77 82 73
ellis 90 93 89 88
I have created a grades.awk script that contains the following code:
{
FS=" "
names=$1
vi1=$2
vi2=$3
vi3=$4
rv=$5
#printf("%s ",names);
split(names,nameArray," ");
printf("%s\t",nameArray[1]); //prints the whole array of names for some reason, instead of just the name at position 1 in array ("john")
}
So my question is, how do I split this correctly? Am I doing something wrong?
How do you read line by line, word by word correctly. I need to add each column into its own array. I've been searching for the answer for quite some time now and can't fix my problem.
here is a template to calculate average grades per student
$ awk '{sum=0; for(i=2;i<=NF;i++) sum+=$i;
printf "%s\t%5.2f\n", $1, sum/(NF-1)}' file
mona 77.25
john 83.75
andreja 89.50
jasper 73.75
george 71.50
ellis 90.00
printf("%s\t",nameArray[1])
is doing exactly what you want it to do but you aren't printing any newline between invocations so it's getting called once per input line and outputting one word at a time but since you aren't outputting any newlines between words you just get 1 line of output. Change it to:
printf("%s\n",nameArray[1])
There are a few other issues with your code of course (e.g. you're setting FS in the wrong place and unnecessarily, names only every contains 1 word so splitting it into an array doesn't make sense, etc.) but I think that's what you were asking about specifically.
If that's not all you want then edit your question to clarify what you're trying to do and add concise, testable sample input and expected output.

Variable Delimited Text in Excel

I have a string of text that I need delimited:
New Utilizers 75 28 9 66 66 79 74 69 29 21 84 75 675 20,511 45,925
Ordinarily I would just use a space delimiter and I'd be all set, but this splits the "New Utilizers" string into two columns instead of one. Is there a way to start delimiting after a certain point, in this case start after new utilizers
Can you choise another delimiter? say $ or ;
if $ for example
New Utilizers$75$28$9$66$66$79$74$69$29$21$84$75$675$20,511$45,925
then split by $

Resources