Split flat file into multiple files - visual-studio-2012

I need to create a package which splits the huge flat file into multiple flat files.
I have flat file which has 20 million rows and now I need to split this flat file (Each flat file needs to have 55 k rows)
Example:- if there are 111 rows in total, I would have to create 3 files.
file1.txt will have 1-55 rows
file2.txt will have 55-110 rows
file3.txt will have 1 rows
.
What options do I have?
I am using Visual Studio 2012 for this project.

you could try something like this... its pretty rudimentary and I am sure that someone will point out that it is not going to be the most efficient thing but its a solid option. Note that you will need to add some try catch error handling.
int recper = 0; // this is where you will assign the number of records per file
int reccount = 0;
int filecount = 1;
string filename = "testfilename";
string networkDirectory = #"c:\fakepath\";
string fileToRead = #"c:\fakepath\textfile.txt";
using (StreamReader reader = new StreamReader(fileToRead,Encoding.Default,true))
{
while (reader.Peek() > 0)
{
using (StreamWriter writer = new StreamWriter(Path.Combine(networkDirectory, filename + filecount + ".txt"), true, Encoding.Default))
{
writer.Write(reader.ReadLine());
}
reccount++;
// checks on each iteration of the while loop to see if the
// current record count matches the number of records per file
// if sso reset reccount and change increment filecount to change the file name
if (reccount == recper)
{
reccount = 0;
filecount++;
}
}
}

Another way you can do this is in a dataflow:
First use your method of choice to add a "Row Number" column to your data flow (unless there is already one in your flat file output, in which case skip this step and use that):
https://www.google.com/search?sourceid=navclient&aq=&oq=add+rownumber+column+to+ssis+dataflow&ie=UTF-8&rlz=1T4GGNI_enUS551US551&q=add+rownumber+column+to+ssis+dataflow&gs_l=hp....0.0.0.6218...........0._Qm62-0x_YQ
Then add a MultiCast transformation to your dataflow, and use the row number to split the stream and send it to different destinations:
Row 1 - 55k, -> File1
Row 55001 - 110k -> File2
etc.

Related

Batch Bookmark & Extraction of Bookmarked lines into MULTIPLE new file in EmEditor

I will like to do a follow up question to this question here : How to search and bookmark multiple strings at once in emeditor?
However, #Yutaka gave this response: https://i.stack.imgur.com/HLYoo.png and that is my follow up question.
How can each bookmarked strings saved differently or opened differently in emeditor when use the "extract bookmark lines to new file".
I will appreciate your response.
I am expecting the result of each bookmark strings to open in different files in em editor.
Batch Extract
The Batch Extract button in the Batch Find dialog will output all matched lines to a file, with labels indicating the entry that the lines matched with.
Example output:
===== a =====
a
apple
===== b =====
b
Macro
This macro finds the given array of strings with the extract option, and it saves each result to separate files. Make sure to change the marked lines.
var filePath = document.FullName;
var strings = ["a", "b", "e"]; // Change these
for (var i = 0; i < strings.length; i++) {
editor.OpenFile(filePath);
document.selection.Find(strings[i], eeFindSelectAll | eeFindExtract, 0);
document.Save("%HOMEPATH%\\Downloads\\" + strings[i] + ".txt"); // Change path
}
I tested it on this file. Make sure to save it before running the macro.
a
b
c
d
apple

Matlab Data Preprocessing and Dynamic Struct Assignments

I'm quite new to Matlab and I'm struggling trying to figure out how to properly preprocess my data in order to make some calculations with it.
I have an Excel table with financial log returns of many companies such that every row is a day and every column is a company:
I imported everything correctly into Matlab like this:
Now I have to create what's caled "rolling windows". To do this I use the following code:
function [ROLLING_WINDOWS] = setup_returns(RETURNS)
bandwidth = 262;
[rows, columns] = size(RETURNS);
limit_rows = rows - bandwidth;
for i = 1:limit_rows
ROLLING_WINDOWS(i).SYS = RETURNS(i:bandwidth+i-1,1);
end
end
Well if I run this code for the first column of returns everything works fine... but my aim is to produce the same thing for every column of log returns. So basically I have to add a second for loop... but what I don't get is which syntax I need to use in order to make that ".SYS" dynamic and based on my array of string cells containing company names so that...
ROLLING_WINDOWS(i)."S&P 500" = RETURNS(i:bandwidth+i-1,1);
ROLLING_WINDOWS(i)."AIG" = RETURNS(i:bandwidth+i-1,2);
and so on...
Thanks for your help guys!
EDIT: working function
function [ROLLING_WINDOWS] = setup_returns(COMPANIES, RETURNS)
bandwidth = 262;
[rows, columns] = size(RETURNS);
limit_rows = rows - bandwidth;
for i = 1:limit_rows
offset = bandwidth + i - 1;
for j = 1:columns
ROLLING_WINDOWS(i).(COMPANIES{j}) = RETURNS(i:offset, j);
end
end
end
Ok everything is perfect... just one question... matlab intellissense tells me "ROLLING_WINDOWS appears to change size on every loop iteration bla bla bla consider preallocating"... how can I perform this?
You're almost there. Use dynamic field names by building strings for fields. Your fields are in a cell array called COMPANIES and so:
function [ROLLING_WINDOWS] = setup_returns(COMPANIES, RETURNS)
bandwidth = 262;
[rows, columns] = size(RETURNS);
limit_rows = rows - bandwidth;
%// Preallocate to remove warnings
ROLLING_WINDOWS = repmat(struct(), limit_rows, 1);
for i = 1:limit_rows
offset = bandwidth + i - 1;
for j = 1:columns
%// Dynamic field name referencing
ROLLING_WINDOWS(i).(COMPANIES{j}) = RETURNS(i:offset, j);
end
end
end
Here's a great article by Loren Shure from MathWorks if you want to learn more: http://blogs.mathworks.com/loren/2005/12/13/use-dynamic-field-references/ ... but basically, if you have a string and you want to use this string to create a field, you would do:
str = '...';
s.(str) = ...;
s is your structure and str is the string you want to name your field.

OLE2 command for returning the number of columns in an Excel file

Do we have an OLE2 command that can be used in Oracle Forms to return the number of columns in my excel file?!
I want to open an excel file from Oracle Forms and go through all columns and then just some the colomns.
Thanks!
To accomplish this I use a trivial solution, I iterate through all the col until the first empty one in the first line. Of course my file have always have the first row's col filled.
When I have a non filled row I use a constant, manually defined...
I iterate through like this:
Variant ws = /*(set your worksheet here)*/;
int col = 1;
for (int col = 1; toString(ws.olePropertyGet("Cell", row, col).olePropertyGet("value")) != ""); ++col)
//do stuff ++count;
It is dirty, but I never found a better way to do this, and I will follow this question to find a new one.

My Excel file of CSV data from Google Apps Script doesn't mirror my Google Spreadsheet

I'm trying to write a script that passes information from a Google Spreadsheet, compiles it into a CSV file and emails that file.
My problem: The CSV file on my Excel file looks very different that of my Google Spreadsheet (Dead link).
This is what my Excel file looks like, pasted into another Google Spreadsheet.
The code I am using is below:
function myFunction() {
//get active sheet, the last row where data has been entered, define the range and use that range to get the values (called data)
var sheet = SpreadsheetApp.getActiveSheet();
var lastRow=sheet.getLastRow();
var range = sheet.getRange(1,1,lastRow,91);
var data = range.getValues();
//define a string called csv
var csv = "";
//run for loop through the data and join the values together separated by a comma
for (var i = 0; i < data.length; ++i) {
csv += data[i].join(",") + "\r\n";
}
var csvFiles = [{fileName:"example.csv", content:csv}];
MailApp.sendEmail(Session.getUser().getEmail(), "New Journey Information", "", {attachments: csvFiles});
}
You need to ensure that individual cells' data is atomic. For instance, what you see as a time on the sheet contains a Date object when read by your script, then when that's written to a CSV it may be converted to a date & time string with commas, depending on your locale. (Jan 4, 2013 14:34 for example.) To be safe with punctuation that may be interpreted as delimiters by Excel, you should enclose each element with quotes.
No need to modify your code, as this problem has been solved in one of the examples provided in the Docslist Tutorial. So an easy solution is to replace your code.
Change the first bit of the provided saveAsCSV() as follows, and it will operate either with user input or by passing a filename as a parameter.
function saveAsCSV(filename) {
// Prompts the user for the file name, if filename parameter not provided
fileName = filename || Browser.inputBox("Save CSV file as (e.g. myCSVFile):");
...

Sharepoint 2010 How to use "File Size" column value in a formula?

I am trying to use "File Size" (aka "FileSizeDisplay") in a Calculated column formula.
"File Size" is an existing column (default SP not custom).
But is not available in the "Insert Column" list of any library.
And SP displays an error message that states it does not exist if it is added to a formula manually as either [File Size] or [FileSizeDisplay].
All I want to do is inform a user that an image is too big. Not trying to prohibit file size upload or anything technical like that. Just want a Calculated column to display a message.
If the column value was available the following would work:
=IF([File Size]>50000,"Image is too big","Image is sized correctly")
or
=IF([FileSizeDisplay]>50000,"Image is too big","Image is sized correctly")
Any one know why this column is not available?
Cheers
You'll want to get the file size first: get file size then you can display of message in a pop up or how ever you'd like
using System;
using System.IO;
class Program
{
static void Main()
{
// The name of the file
const string fileName = "test.txt";
// Create new FileInfo object and get the Length.
FileInfo f = new FileInfo(fileName);
long s1 = f.Length;
// Change something with the file. Just for demo.
File.AppendAllText(fileName, " More characters.");
// Create another FileInfo object and get the Length.
FileInfo f2 = new FileInfo(fileName);
long s2 = f2.Length;
// Print out the length of the file before and after.
Console.WriteLine("Before and after: " + s1.ToString() +
" " + s2.ToString());
// Get the difference between the two sizes.
long change = s2 - s1;
Console.WriteLine("Size increase: " + change.ToString());
}
}

Resources