Manipulate CSV file: increment cell coordinates/position - excel

I have a csv file with one entry on each line, three entries form a whole dataset. So what I need to do now, is to put these sets in the columns in one row. I have difficutlies to describe the problem (thus my search was not giving me a solution), so here's an example.
Sample CSV file:
1 Joe
2 Doe
3 7/7/1990
4 Jane
5 Done
6 6/6/2000
What I want in the end is this:
1 Name Surname Birthdate
2 Joe Doe 7/7/1990
3 Jane Done 6/6/2000
I'm trying to find a solution to make this automatically, as my actual file consists of 480 datasets, each set containing 16 entries, and it would take me days to do it manually.
I was able to fill the first line with Excel's indirect function:
=INDIRECT("A"&COLUMN()-COLUMN($A1))
As COLUMN returns the column number, if I drag the first line down in Excel, obviously this shows exactly the same as the first line:
1 Name Surname Birthdate
2 Joe Doe 7/7/1990
3 Joe Doe 7/7/1990
Now I'm looking for a way to increment the cell position by one:
A B C D
1 Joe =A1 =B1+1 =C1+1
2 Doe =D1+1
3 7/7/1990
4 Jane
What should lead to:
A B C D
1 Joe =A1 =A2 =A3
2 Doe =A4 =A5 =A4
3 7/7/1990
4 Jane
As you can see in the example given, the cell coordinates for A increment by one, and I have no idea how to do this automatically in Excel. I think there must be a better way than using nested Excel function, as the task (increment +1) seems actually pretty easy.
I'm also open to solutions involving sed, awk (of which I only have a very superficial knowledge) or other command line tools.
You're help is appreciated very much!

awk 'BEGIN { y=1; printf "Name Surname Birthdate\n%s",y; x=1;}
{if (x == 3) {
y = y + 1;
printf "%s\n%s",$2,y;
x=1;
}
else {
printf " %s ",$2;
x = x + 1;
}}' input_file.txt
This may work for what you want to do. Your sample does not include the commas, so I'm not sure if they are really in there or not. If they are, you will need to modify the code slightly with the -F, flag so that it treats them as such.
This second code snippet will provide the output with a comma delimiter. Again, it is assuming that your sample input file did not have commas to delimit the 1 Joe and 2 Doe.
awk 'BEGIN { y=1; printf "Name Surname Birthdate\n%s",y; x=1;}
{if (x == 3) {
y = y + 1;
printf "%s\n%s,",$2,y;
x=1;
}
else {
printf " %s,",$2;
x = x + 1;
}}' input_file.txt
Both of the awk scripts will set x and y variables to one, where the y variable will increment your line numbering. The x variable will count up to 3 and then reset itself back to one. This is so that it prints each line in a row, until it gets to the 3rd item where it will then insert a newline character.
There are easier/more complex ways to do this with regexes and a language like perl, but since you mentioned awk, I believe this will work fine.

Related

Extract substring from first column

I have a large text file with 2 columns. The first column is large and complicated, but contains a name="..." portion. The second column is just a number.
How can I produce a text file such that the first column contains ONLY the name, but the second column stays the same and shows the number? Basically, I want to extract a substring from the first column only AND have the 2nd column stay unaltered.
Sample data:
application{id="1821", name="app-name_01"} 0
application{id="1822", name="myapp-02", optionalFlag="false"} 1
application{id="1823", optionalFlag="false", name="app_name_public"} 3
...
So the result file would be something like this
app-name_01 0
myapp-02 1
app_name_public 3
...
If your actual Input_file is same as the shown sample then following code may help you in same.
awk '{sub(/.*name=\"/,"");sub(/\".* /," ")} 1' Input_file
Output will be as follows.
app-name_01 0
myapp-02 1
app_name_public 3
Using GNU awk
$ awk 'match($0,/name="([^"]*)"/,a){print a[1],$NF}' infile
app-name_01 0
myapp-02 1
app_name_public 3
Non-Gawk
awk 'match($0,/name="([^"]*)"/){t=substr($0,RSTART,RLENGTH);gsub(/name=|"/,"",t);print t,$NF}' infile
app-name_01 0
myapp-02 1
app_name_public 3
Input:
$ cat infile
application{id="1821", name="app-name_01"} 0
application{id="1822", name="myapp-02", optionalFlag="false"} 1
application{id="1823", optionalFlag="false", name="app_name_public"} 3
...
Here's a sed solution:
sed -r 's/.*name="([^"]+).* ([0-9]+)$/\1 \2/g' Input_file
Explanation:
With the parantheses your store in groups what's inbetween.
First group is everything after name=" till the first ". [^"] means "not a double-quote".
Second group is simply "one or more numbers at the end of the line preceeded with a space".

Multiple text insertion in Linux

Can someone help me how to write a piece of command that will insert some text in multiple places (given column and row) of a given file that already contains data. For example: old_data is a file that contains:
A
And I wish to get new_data that will contain:
A 1
I read something about awk and sed commands, but I don't believe to understand how to incorporate these, to get what I want.
I would like to add up, that this command I would like to use as a part of script
for b in ./*/ ; do (cd "$b" && command); done
If we imagine content of old_data as a matrix of elements {An*m} where n corresponds to number of row and m to number of column of this matrix, I wish to manipulate with matrix so that I could add new elements. A in old-data has coordinates (1,1). In new_data therefore, I wish to assign 1 to a matrix element that has coordinates (1,3).
If we compare content of old_data and new_data we see that (1,2) element corresponds to space (it is empty).
It's not at all clear to me what you are asking for, but I suspect you are saying that you would like a way to insert some given text in to a particular row and column. Perhaps:
$ cat input
A
B
C
D
$ row=2 column=2 text="This is some new data"
$ awk 'NR==row {$column = new_data " " $column}1' row=$row column=$column new_data="$text" input
A
B This is some new data
C
D
This bash & unix tools code works:
# make the input files.
echo {A..D} | tr ' ' '\n' > abc ; echo {1..4} | tr ' ' '\n' > 123
# print as per previous OP spec
head -1q abc 123 ; paste abc 123 123 | tail -n +2
Output:
A
1
B 2 2
C 3 3
D 4 4
Version #3, (using commas as more visible separators), as per newest OP spec:
# for the `sed` code change the `2` to whatever column needs deleting.
paste -d, abc 123 123 | sed 's/[^,]*//2'
Output:
A,,1
B,,2
C,,3
D,,4
The same, with tab delimiters (less visually obvious):
paste abc 123 123 | sed 's/[^\t]*//2'
A 1
B 2
C 3
D 4

How to sort lines in textfile according to a second textfile

I have two text files.
File A.txt:
john
peter
mary
alex
cloey
File B.txt
peter does something
cloey looks at him
franz is the new here
mary sleeps
I'd like to
merge the two
sort one file according to the other
put the unknown lines of B at the end
like this:
john
peter does something
mary sleeps
alex
cloey looks at him
franz is the new here
$ awk '
NR==FNR { b[$1]=$0; next }
{ print ($1 in b ? b[$1] : $1); delete b[$1] }
END { for (i in b) print b[i] }
' fileB fileA
john
peter does something
mary sleeps
alex
cloey looks at him
franz is the new here
The above will print the remaining items from fileB in a "random" order (see http://www.gnu.org/software/gawk/manual/gawk.html#Scanning-an-Array for details). If that's a problem then edit your question to clarify your requirements for the order those need to be printed in.
It also assumes the keys in each file are unique (e.g. peter only appears as a key value once in each file). If that's not the case then again edit your question to include cases where a key appears multiple times in your ample input/output and additionally explain how you want the handled.

Search if two strings both in separate columns exist in one row in excel?

How can I use the find function to see if two strings exist in excel?
EX:
row/column A B C D
1 fly cat dog fish
2 cat pig horse dog
3 zebra pig cat elephant
I want to search what rows both contains cat and dog. How can I achieve this?
How about use AND,COUNTIF function together:
=AND(COUNTIF(A1:D1,"cat"),COUNTIF(A1:D1,"dog"))
If the row contains both cat and dog,it returns TRUE.
If the formula returns a 1, the row contains both cat and dog. If it returns 0, at least one is missing from the row:
=SUMPRODUCT(MAX(--(A1:D1="cat"))*MAX(--(A1:D1="dog")))
If you really love find function,though it seemed longer.
=AND(FIND("cat",CONCATENATE(A1,"|",B1,"|",C1,"|",D1)),FIND("dog",CONCATENATE(A1,"|",B1,"|",C1,"|",D1)))
Separator | is important!
cat + dog = catdog
ca + tdog = catdog
But add separator |(or other) will not:
cat + |+ dog = cat|dog
ca + | + tdog= ca|tdog
Note that find is case-sensitive.
If not case-sensitive you can use search function.
If you are happy to run a formula on each line, then you could copy this down and check for a value of "2".
=IF(AND(COUNTIF($A3:$D3,"cat")>0,COUNTIF($A3:$D3,"dog")>0),"Found them","")
You could replace the literals for a cell reference so that you can easily change out your search words. Maybe something like this:
=IF(AND(COUNTIF($A3:$D3,$F$1)>0,COUNTIF($A3:$D3,$G$1)>0),"Found them","")

How to merge 2 rows into 1 row at the same column using awk

I just started using the UNIX and also no much experience in scripting. Now I am struggling a lot to merge the 2 rows at the same column. Below is original data.
There columns are split into 2 rows but ideally should be in 1 row.
But I don't know how to do it.
Original File
User Middle Last
Name Name Name
Htat Ko Lin
John Smith Bill
Trying to achieve:
UserName MiddleName LastName
Htat Ko Lin
John Smith Bill
Thanks!
Htat Ko
This can be done using awk and for loops
awk 'NR==1{for(i=1;i<=NF;i++)a[i]=$i;next}NR==2{for(i=1;i<=NF;i++)$i=a[i]$i}1' file
Output
UserName MiddleName LastName
Htat Ko Lin
John Smith Bill
Explanation
NR==1
If the record number is 1. i.e the first record then execute the next block
for(i=1;i<=NF;i++)
Loop from one to the number of fields(NF).Incrementing by one each time.
a[i]=$i
Using i as a key set an array element in the array a to the field i ($i).
next
Skip all further instruction and move to the next record.
NR==2
Same as before but for record 2
for(i=1;i<=NF;i++)
Exactly the same as before
$i=a[i]$i
Set field i to the stored value in the array and then itself
1
Defaults to true so prints all lines unless next has been used
Additional notes
if you want keep the columns in line the easiest was to do this is to pipe that command into column -t
awk '...' file | column -t
Reduced version
awk '{for(i=1;i<=NF;i++)(NR==2&&$i=a[i]$i)||a[i]=$i}NR>1' file

Resources