Split large text file by date field and save as Month [closed]

Split large text file by date field and save as Month [closed] - text

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I have large text files ranging from 400mb to 1GB.
Its a tab delimited file with the date being the second field. The records are not sorted by date and can be in any order.
The number of records and period vary from file to file
Each record in the text file has a date field in the format of 14/02/2012 (ie. dd/mm/yyyy). I want to split the text files by date and save as Month.txt (e.g 2012Jan.txt).
The Jan.txt file should contain records only for the period 1st Jan 2012 to 31st Jan 2012.
What would be the best way to do this? Could someone recommend a code/programming tool to achieve this please.
Thanks

in a Windows .BAT file, with header in each output file:
awk "NR==1{header=$0};NR>1{split($2,date,\"/\");file=date[3]strftime(\"%%b.txt\",(date[2]-1)*31*24*60*60);if(!wrote[file]++)print header>file;print>file}" %1
(here you pass the name of the input file as an argument to the .BAT call; if it's always the same file, you can change the %1 to that file name instead)

in a UNIX shell:
awk 'NR>1{split($2,date,"/");print>date[3]strftime("%b.txt",(date[2]-1)*31*24*60*60)}' large.txt
in Windows CMD shell:
awk "NR>1{split($2,date,\"/\");print>date[3]strftime(\"%b.txt\",(date[2]-1)*31*24*60*60)}" large.txt
in a Windows .BAT file:
awk "NR>1{split($2,date,\"/\");print>date[3]strftime(\"%%b.txt\",(date[2]-1)*31*24*60*60)}" large.txt
(assuming your file is named large.txt)

Related

how to open a large (100GB) .txt file? [duplicate]

This question already has answers here:
Working with huge files in VIM
(10 answers)
Closed 9 years ago.
I have a .txt file of ~100GB. Is there a text editor that I can use to open this? If so, how will this actually be stored in memory? I only have 16GB of RAM.
I'm also exploring other options such as splitting the file into 2 or more pieces. Any suggestions on how to do this efficiently on the command line in linux?
Thanks

Take a look at the utilities HEAD and TAIL if using the command line. Often I will use
tail -<number of lines> | more
And to split the file look at SPLIT.

Text Editor for gigabyte sized files [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Text editor to open big (giant, huge, large) text files
I saw text editor to open big text files but that question referred to megabyte sized files. I work with 7GB csv files and find that even vim and gedit take a long time to open up.
What text editor do you use for for gigabyte sized files?
Appreciate any advice I can get.

don't know about others but i use vim (on windows) for editing GB files and it works every time. http://vim.sourceforge.net/

You can use total commander

Which copy protection techniques are available for digital material? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Suppose a website offers the following resources for premium users:
PDF Files
Video Files
Presentations (e.g. .ppt files)
Which protection techniques are available to prevent (slow down) the user to copy and re-distribute these resources?

PDF - password protection
Video files - DRM (unable to play without license file)
Presentations? No idea.
Most these techniques are also sure techniques to repel normal users from your site.

One good way to protect your material is to make your web site the easiest way to get/view/access your stuff. Note that Apple makes millions of dollars selling MP3's on ITunes that are wholly unprotected, because it is easier for most people to grab them on Itunes than to find them on torrent sites.
Ultimately, you will not be able to prevent a determined user from copying and redistributing your material. The most you can do is try to slow them down. Whatever encryption method you end up using will require a key, and that key will need to end up on your user's computer. Therefore, a determined user will have everything they need to grab the content from you. What you can do is annoy average users enough that they decide it is not worth the trouble. However, there is a fine line to walk between annoying users enough that they pay, and annoying them so much that they leave your site entirely.

Nothing will prevent the user redistributing anything that can be downloaded to the local device. Very few will actually 'slow down' this either. Most all will inconvenience legitimate users completely.
Create compelling content and offer it for a compelling price. Those that see the value will buy it, those that don't see the value would never buy it to begin will so you are really losing anything.

For images, you can put in a watermark (translucent text over the image, but not very noticible, saying something like "© 2010 Me inc.").
Same goes for video files, but in video you could move it to make the process of removing it (which is already extremely hard) harder.
Presentations, I have no clue either, but you could always try having "© 2010 Me inc." at the bottom of all the slides, or on the BG picture.
In truth, there is no way to fully protect your files, but these solutions will do the best to slow down, and possibly stop the user from redistributing your work.

Pitch identification in Linux [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Is there any free software tool or combination that allows me to identify the pitch of a recorded singing session?
The idea is to display some kind of graph with the current pitch in a time line along with markers for the standard notes (C3, C#3, D, etc). I don't need pitch correction and I don't need it to be done in real time, either.
I know that once there was a plugin for Rosegarden that did that, but it has gone missing.

Checkout Audacity. It came out of a project to do musical pitch analysis.

Not exactly what you are looking for, but the Singstar lookalike Ultrastar-NG at least does something like this.
http://ultrastar-ng.sourceforge.net/

I'm unaware of any software package that has this built in. If you're interested in writing something like this, you'll want to look at Discrete Fourier Transforms. This turns a time-series sample into a collection of frequencies. But this leaves you with no information about when the various frequencies occur, so you must do a windowed Fourier Transform, with windows of whatever time-resolution you want. Increasing the time resolution decreases the frequency resolution, however.
The simplest thing to do is to figure out the largest frequency component in any window and call that the frequency. But real music (a) has chords and (b) has overtones and undertones. In addition singing often has "tremolo", where the singer varies the actual pitch around the theoretical pitch the music is marked at.

Praat will at least do automatic pitch estimation of complex sounds. Though I don't know if it can mark the standard notes as you requested.
Rob

How do I search content, within audio files/streams? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I have always wondered how many different search techniques existed, for searching text, for searching images and even for videos.
However, I have never come across a solution that searched for content within audio files.
For example: Let us assume that I have about 200 podcasts downloaded to my PC in the form of mp3, wav and ogg files. They are all named generically say podcast1.mp3, podcast2.mp3, etc. So, it is not possible to know what the content is, without actually hearing them. Lets say that, I am interested in finding out, which the podcasts talk about 'game programming'. I want the results to be shown as:
Podcast1.mp3 - 3 result(s) at time index(es) - 0:16:21, 0:43:45, 1:12:31
Podcast21.ogg - 1 result(s) at time index(es) - 0:12:01
So my questions:
How could one approach this problem?
Are there are suitable algorithms developed to do something like this?
One idea the cropped up in my mind was that, one could use a 'speech-to-text' software to get transcripts along with time indexes for each of the audio files, then parse the transcript to get the output.
I was considering this as one of my hobby projects.
Thanks!

If you want to search for text (i.e. what is being said) inside an audio stream you would have to process it with some kind of speech recognition algorithm and store the text as meta data associated with the files. For video you could also do text recognition for text inside the video. Evernote already does this for text inside image files, but has no support for audio as far as I know.
Something similar is possible when using audio to search for audio. I don't know the details of these algorithms, but I'm guessing they involve some kind of frequency analysis. Shazam is using this kind of technology to identify songs based on audio clips.
Here are some Wikipedia articles that may be useful:
Speech recognition
Fast Fourier transform
Frequency analysis (frequency spectrum)
Optical character recognition (OCR)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string