I got tab delimited files produced by radio automation software. One file for one day with following content: date time artist title. I need this files intact, but also need following scenario: when first file gets produced needs to be duplicated , renamed and remove first two fields(date and time). Than all the proceeding files must be concatenated in this file with only artist and title.
Is this possible at all?
Tnx
In Bash I'd just do it with awk
awk -F "\t" '{print$1"\t"$2}' file1.txt > file2.txt
Basically what this does, is grab "\t" which stands for a tab, and it grabs data before the first tab ($1) then makes it replace the first tab with "\t", another tab. and then it grabs the data behind the second tab ($2).
Let's say these are the contents of your current file1:
date:time:artist:title
You want to make file2 print out artist (tab) title
The awk code in your bash script would be:
awk -F ":" '{print$3"\t"$4}' file1.txt > file2.txt
Related
I have a text file that is pipe-delimited that also has a new line indicator (START_OF_RECORD). The values are enclosed with single quotes and line breaks are expected in the 5th field. Notice the values with line breaks are still enclosed in single quotes though
Does excel have a native way to handle this? As far as I know, excel can only take in a custom delimiter. It's the START_OF_NEW_LINE that is causing the issue.
Sample screen shot of desired output, followed by input, followed by input as text.
|'START_OF_LINE'|'Key 1'|'Key 2'|'Key 3'|'text1
text2
text3
text4
text5'|'Date'|'END_OF_LINE'|'ID 1'|'ID 2'|'ID 3'|'ID 4'|'ID 5'|
|'START_OF_LINE'|'Key 1'|'Key 2'|'Key 3'|'text5
text6
text7
text8
text9'|'Date'|'END_OF_LINE'|'ID 1'|'ID 2'|'ID 3'|'ID 4'|'ID 5'|
I'm sure this can be hacked together with some tedious VBA but am really hoping there is a better way to do this before starting to write out code. I just have no idea how to handle the new line field using native functionality in excel
The case seems to be consistent. I've used notepad++ find and replace function on the text. and it seems to deliver what you need.
copy the text above and paste in notepad++ > replace "|\r\n|" with "|·|" **
then > replace "\n" with "\n||||"
then > replace "|·|" with "|\n"
remove the 1st "|"
copy n paste into excel, with "|" as delimiter.
Done.
**[Note:\r may not appear in the original file.. it is there in the copy paste activity.. omit it if it is not applicable]
If all the above can be executed using regex, then it is just a line of code away.. ( :
Found these answers but from these it is not clear to me how to simply create excel sheets.
These two marked as answer do create a sheet out of a given txt BUT the data in the two txt columns get inserted in a single column in the table created:
How to convert a text file into excel in bash or perl
As if the tab-delimiter didnt work.
This answer does the same to me:
How to write text file data into same cell of excel using bash
This one is too complicated for an amateur:
Paste output into a CSV file in bash with paste command
I just am not able to decipher and simplify the stuff.
This does the same - columns end up merged in the first one:
#!/bin/bash
while read value; do
echo "$value"
done <tabulka.txt > test.csv
May I ask for a simple way to put data to an xls/csv? Im not really a bash expert, just an engineer forced to work with it. Thanks!
EDIT:
sample textfile as requested (tab as delimiter):
header1 header2
aaaa 1.0
bbbb 1.1
cccc 1.3
result:
Ok, so first step is to convert .txt to .csv. You don't provide your input data, so it is hard to see how you want to split it. If we assume, that your input file has fairly regular structure and you want simply replace " " to ",", you can do it like this. (But it would really be better if you clarified your question with sample data.):
#!/bin/bash
while read value; do
echo "$value" | tr "\t" ","
done <tabulka.txt > test.csv
The second part is explained in the link you pasted - Paste output into a CSV file in bash with paste command
Excel can read the csv file ( as can libreoffice ). This works for me.
libreoffice --calc test.csv
If your file is in any way irregular it's not possible to guess, you have to show it.
Edit: I was writing my answer, when you posted your input. So I'm editing it to reflect, that you use tabs in the input.
I have received a .csv file.
When I open this file with Notepad, all the entire information are displayed in one row:
Email;Cityjohnsmith#live.com;New York
However, when I open the file with MS Excel, it displays the information correctly. How can I recognize the delimiter character? Because the third program that is supposed to read this file is not able to recognize the delimiter.
So your CSV isn't comma delimited is what appears to be the problem.
The way it looks out of your Notepad copy is that the data is delimited by the separator " ; ". This means that each piece of data isn't separated by the typical Comma (,) character, but rather by the semi-colon (;). This is why notepad, which is simply viewing the raw textual data displays differing results than MS-Excel, which is attempting and succeeding to find a semi-common delimiting value in the file upon which to display results.
You may be well-served by either A) writing your code to recognize the delimiter as the semi-colon, and not a comma, or B) by using one of your tools to do a replace to get rid of the semi-colon in place of Commas.
.csv originally referred to comma separated values (csv). However, any character may be used to separate the values, the most common delimiters are the comma, tab, semicolon and colon. If the data is generated by another application you might need to accept semicolons as delimiters.
I'm not sure I'd write code for the problem as you describe. If I was forced to code it I'd write a short awk script to remove hidden (i.e. non-printing) characters.
I use two tools for csv issues. 010 Editor, from SweetScape Software Inc., will show you the file in hex, so you can see any non-displayable characters. The other, Delimit, from delimitware.com, is great for showing columns. In my opinion, 010 Editor will make your problem (and solution) obvious.
Here is a sample awk script that injects non-printing characters into your text. It then uses a regular expression to remove the non-printing characters.
BEGIN {
t=sprintf("%s\a%s\v%s", "Email;","Cityjohnsmith#live.","com;New York");
print "Input :", t;
gsub(/[^\x20-\x7E]/, "", t);
print "Result:", t;
}
To run the above code, use the following command:
awk -f xx.awk
where the above code is put in a text file called xx.awk.
The regex /[^\x20-\x7E]/ identifies all characters that are not printable (i.e. not between 'space' and tilde in ASCII).
The awk gsub statement searches for all characters meeting the regex and removes them.
I have a question about how to automize the process of copying contents from a .srt file onto a .xls file.
I want to make sure that the content in the .srt file could be pasted into the corresponding columns of the .xls (e.g. the time-in to the B column; time-out to the C column; subtitles to the E column.)
In order to avoid manually copying and pasting, is there a way to script this process? Any ideas?
Thank you very much in advance! :)
UPDATE: I just found that Subtitle Edit can save .srt as csv, which will be able to change the file into an Excel file. That's handy! But there's another problem, I need to copy the content from this csv to another Excel template, which has a different structure, so I can't directly copy and paste the values in the csv. I'm working on how to make this easier...
Can't post images for now, but the situation is that while each time-in text in the srt converted csv file takes up one row, the time-in text in the Excel template takes up two rows, so I can't directly copy and paste all the texts from one excel file to the other. Is there any easier ways to do this? Thank you!
In a script, you can use perl to do the substitution:
perl -0777 -pe 's/\n([^\n])/\t$1/g; s/ --> /\t/g' input.srt | \
perl -ne 's/^\t//; print unless /^$/' > output.csv
For this sample input
1
00:00:01,478 --> 00:00:04,020
Srt sample
2
00:00:05,045 --> 00:00:09,545
<i>italic</i> font
3
00:00:09,378 --> 00:00:13,745
<b>bold</b> font
4
00:00:14,812 --> 00:00:16,144
Multi
Line
you get the following output:
1 00:00:01,478 00:00:04,020 Srt sample
2 00:00:05,045 00:00:09,545 <i>italic</i> font
3 00:00:09,378 00:00:13,745 <b>bold</b> font
4 00:00:14,812 00:00:16,144 Multi Line
Regarding the command: There are two chained perl commands
The first one does the hard work: replaces newlines and arrows with tabs (keeping double newline as one newline).
The second one only does some cleaning, it removes tabs from line beginnings and removes redundant empty lines.
In Vim I would use:
:%s/\(.\)$\n\|-->/\1\t/g | :g/^$/d | :%s=\s\+$==
I know still not a script but now it should be easy to import it in Excel :-)
It means find the line ending with character and substitute it with that character and tabulator or find characters --> and substitute them with tabulator, and then delete empty lines, and at last remove white spaces at the end of the lines.
i need to replace a portion of a line of text with another line of text ex:
07/24/2012 06:30:00 <--what i start with
07/24/2012 06:30:00 Name=weather <---is what i need it to look like
every day the date changes and i have about 20 of these lines to change every day, whats the easiest way to do this using a bat file, i want to be able to run it and it would open the file, change what needed to be changed then spit out the changed text file in another location. there are hundreds of lines in this text file that need to stay in the new one and not change only about 20 or so need to be changed. i dont need it to loop at all since every time i need to edit the file the text needed to change will be exactly the same and it will need to change the same number of lines each time. thanks in advance
One way using sed:
sed -e "s/\(.*\)/\1 Name=weather/" file.txt > /your/new/location/newfile.txt
Perhaps you should update your question, to include example input and expected output. But the above line should get you started.