I have a tab-delimited text file of size of many GBs. Task here is to append header texts to each column. As of now, I use StreamReader to read line by line and append headers to each column. It takes a lot of time as of now. Is there a way to make it faster ? I was thinking if there is a way to process the file column-wise. One way would be to import the file in database table and then bcp out the data after appending the headers. Is there any other better way, probably by calling powershell, awk/sed in C# code ?
Code is as follows :
StreamReader sr = new StreamReader(#FilePath, System.Text.Encoding.Default);
string mainLine = sr.ReadLine();
string[] fileHeaders = mainLine.Split(new string[] { "\t" }, StringSplitOptions.None);
string newLine = "";
System.IO.StreamWriter outFileSw = new System.IO.StreamWriter(#outFile);
while (!sr.EndOfStream)
{
mainLine = sr.ReadLine();
string[] originalLine = mainLine.Split(new string[] { "\t" }, StringSplitOptions.None);
newLine = "";
for (int i = 0; i < fileHeaders.Length; i++)
{
if(fileHeaders[i].Trim() != "")
newLine = newLine + fileHeaders[i].Trim() + "=" + originalLine[i].Trim() + "&";
}
outFileSw.WriteLine(newLine.Remove(newLine.Length - 1));
}
Nothing else operating on just text files is going to be significantly faster - fundamentally you've got to read the whole of the input file, and you've got to create a whole new output file, as you can't "insert" text for each column.
Using a database would almost certainly be a better idea in general, but adding a column could still end up being a relatively slow business.
You can improve how you're dealing with each line, however. In this code:
for (int i = 0; i < fileHeaders.Length; i++)
{
if(fileHeaders[i].Trim() != "")
newLine = newLine + fileHeaders[i].Trim() + "=" + originalLine[i].Trim() + "&";
}
... you're using string concatenation in a loop, which will be slow if there's a large number of columns. Using a StringBuilder is very likely to be more efficient. Additionally, there's no need to call Trim() on every string in fileHeaders on every line. You can just work out which columns you want once, trim the header appropriately, and filter that way.
Related
I have a string.
string str = "TTFTTFFTTTTF";
How can I break this string and add character ","?
result should be- TTF,TTF,FTT,TTF
You could use String.Join after you've grouped by 3-chars:
var groups = str.Select((c, ix) => new { Char = c, Index = ix })
.GroupBy(x => x.Index / 3)
.Select(g => String.Concat(g.Select(x => x.Char)));
string result = string.Join(",", groups);
Since you're new to programming. That's a LINQ query so you need to add using System.Linq to the top of your code file.
The Select extension method creates an anonymous type containing the char and the index of each char.
GroupBy groups them by the result of index / 3 which is an integer division that truncates decimal places. That's why you create groups of three.
String.Concat creates a string from the 3 characters.
String.Join concatenates them and inserts a comma delimiter between each.
Here is a really simple solution using StringBuilder
var stringBuilder = new StringBuilder();
for (int i = 0; i < str.Length; i += 3)
{
stringBuilder.AppendFormat("{0},", str.Substring(i, 3));
}
stringBuilder.Length -= 1;
str = stringBuilder.ToString();
I'm not sure if the following is better.
stringBuilder.Append(str.Substring(i, 3)).Append(',');
I would suggest to avoid LINQ in this case as it will perform a lot more operations and this is a fairly simple task.
You can use insert
Insert places one string into another. This forms a new string in your C# program. We use the string Insert method to place one string in the middle of another one—or at any other position.
Tip 1:
We can insert one string at any index into another. IndexOf can return a suitable index.
Tip 2:
Insert can be used to concatenate strings. But this is less efficient—concat, as with + is faster.
for(int i=3;i<=str.Length - 1;i+=4)
{
str=str.Insert(i,",");
}
my text file :
3.456 5.234 Saturday 4.15am
2.341 6.4556 Saturday 6.08am
At first line, I want to read 3.456 and 5.234 only.
At second line, I want to read 2.341 and 6.4556 only.
Same goes to following line if any.
Here's my code so far :
InputStream instream = openFileInput("myfilename.txt");
if (instream != null) {
InputStreamReader inputreader = new InputStreamReader(instream);
BufferedReader buffreader = new BufferedReader(inputreader);
String line=null;
while (( line = buffreader.readLine()) != null) {
}
}
Thanks for showing some effort. Try this
while (( line = buffreader.readLine()) != null) {
String[] parts = line.split(" ");
double x = Double.parseDouble(parts[0]);
double y = Double.parseDouble(parts[1]);
}
I typed this from memory, so there might be syntax errors.
int linenumber = 1;
while((line = buffreader.readLine()) != null){
String [] parts = line.split(Pattern.quote(" "));
System.out.println("Line "+linenumber+"-> First Double: "+parts[0]+" Second Double:"
+parts[1]);
linenumber++;
}
The code of Bilbert is almost right. You should use a Pattern and call quote() for the split. This removes all whitespace from the array. Your problem would be, that you have a whitespace after every split in your array if you do it without pattern. Also i added a Linenumber to my output, so you can see which line contains what. It should work fine
I'm wondering how (and in which way it's best to do it) to split a string with a unknown number of spaces as separator in C++/CLI?
Edit: The problem is that the space number is unknown, so when I try to use the split method like this:
String^ line;
StreamReader^ SCR = gcnew StreamReader("input.txt");
while ((line = SCR->ReadLine()) != nullptr && line != nullptr)
{
if (line->IndexOf(' ') != -1)
for each (String^ SCS in line->Split(nullptr, 2))
{
//Load the lines...
}
}
And this is a example how Input.txt look:
ThisISSomeTxt<space><space><space><tab>PartNumberTwo<space>PartNumber3
When I then try to run the program the first line that is loaded is "ThisISSomeTxt" the second line that is loaded is "" (nothing), the third line that is loaded is also "" (nothing), the fourth line is also "" nothing, the fifth line that is loaded is " PartNumberTwo" and the sixth line is PartNumber3.
I only want ThisISSomeTxt and PartNumberTwo to be loaded :? How can I do this?
Why not just using System::String::Split(..)?
The following code example taken from http://msdn.microsoft.com/en-us/library/b873y76a(v=vs.80).aspx#Y0 , demonstrates how you can tokenize a string with the Split method.
using namespace System;
using namespace System::Collections;
int main()
{
String^ words = "this is a list of words, with: a bit of punctuation.";
array<Char>^chars = {' ',',','->',':'};
array<String^>^split = words->Split( chars );
IEnumerator^ myEnum = split->GetEnumerator();
while ( myEnum->MoveNext() )
{
String^ s = safe_cast<String^>(myEnum->Current);
if ( !s->Trim()->Equals( "" ) )
Console::WriteLine( s );
}
}
I think you can do what you need to do with the String.Split method.
First, I think you're expecting the 'count' parameter to work differently: You're passing in 2, and expecting the first and second results to be returned, and the third result to be thrown out. What it actually return is the first result, and the second & third results concatenated into one string. If all you want is ThisISSomeTxt and PartNumberTwo, you'll want to manually throw away results after the first 2.
As far as I can tell, you don't want any whitespace included in your return strings. If that's the case, I think this is what you want:
String^ line = "ThisISSomeTxt \tPartNumberTwo PartNumber3";
array<String^>^ split = line->Split((array<String^>^)nullptr, StringSplitOptions::RemoveEmptyEntries);
for(int i = 0; i < split->Length && i < 2; i++)
{
Debug::WriteLine("{0}: '{1}'", i, split[i]);
}
Results:
0: 'ThisISSomeTxt'
1: 'PartNumberTwo'
I am very new to c#. I am using Mono. I want to loop through each line in a TextView object and do my own processing to each line. I have narrowed it down to the Buffer property that contains a Text property but this property contains the whole text. How do I break it down into separate lines/strings?
string Line;
for (int i = 0;i < txtvMain.Buffer.LineCount; i++)
{
Line = txtvMain.Buffer.?;
}
(Untested) simple solution (perhaps not the most efficient):
string[] lines = txtvMain.Buffer.Text.split('\n');
I've come across this several times in a couple years of programming so I decided to do some research to see if it was possible. Often I create data structures in code that are initialized in a table like manner, with rows and columns, and I would have liked to have this table-to-text feature for code readability. How can you create a table in word, or excel, or some other program, and output the cells of the table to text, with spaces (not tabs)? Word can do it with tabs, and excel can do it with misaligned spaces. Is there any program out there that automates this?
Have you tried using a monospace font, such as courier, when you export from excel? Most fonts will adjust spacing based on the specific width, height and kerning of each character but a monospace font will allow you to use spaces for alignment.
As for converting tabs to spaces automagically, there must be 100s if not 1000s of methods, apps, commands available out there.
I spent an hour or 2 researching this. I experimented with excel and word and they both came so close to exact solution that it made me crazy. I tried other programs online but with no luck. Here's my solution, Microsoft's Word's Table-To-Text feature and custom C# program that converts the Word-tabified text to column aligned text with spaces and not tabs.
1) Put your columns and rows in an MS Word Table
2) Convert table to text with tabs (look up how to do this)
3) Save the converted table to a plain text file
4) Use my program to open and convert the file
5) Copy the text in the output file to your code
Below is the C# Windows Form Application I wrote. I apologize for lack of optimization. I was at work and wanted it done as quickly as possible:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Windows.Forms;
using System.IO;
namespace WindowsFormsApplication1
{
static class Program
{
[STAThread]
static void Main()
{
Application.EnableVisualStyles();
Application.SetCompatibleTextRenderingDefault(false);
OpenFileDialog of = new OpenFileDialog();
of.Title = "Select Tabbed Text File To Convert";
if (of.ShowDialog() != DialogResult.OK)
return;
StreamReader s = new StreamReader(of.OpenFile());
List<string> lines = new List<string>();
string line;
// Get each line into an array of lines.
while ((line = s .ReadLine()) != null)
lines.Add(line);
int numTabs = 0;
// count the number of tabs in each line, assume good input, i.e.
// all lines have equal number of tabs.
foreach (char c in lines[0])
if (c == '\t')
numTabs++;
for (int i = 0; i < numTabs; i++)
{
int tabIndex = 0;
// Loop through each line and find the "deepest" location of
// the first tab.
foreach (string l in lines)
{
int index = 0;
foreach (char c in l)
{
if (c == '\t')
{
if (index > tabIndex)
tabIndex = index;
break;
}
index++;
}
}
// We know where the deepest tab is, now we go through and
// add enough spaces to take the first tab of each line out
// to the deepest.
//foreach (string l in lines)
for (int l = 0; l < lines.Count; l++)
{
int index = 0;
foreach (char c in lines[l])
{
if (c == '\t')
{
int numSpaces = (tabIndex - index) + 1;
string spaces = "";
for (int j = 0; j < numSpaces; j++)
spaces = spaces + " ";
lines[l] = lines[l].Remove(index, 1);
lines[l] = lines[l].Insert(index, spaces);
break;
}
index++;
}
}
}
FileInfo f = new FileInfo(of.FileName);
string outputFile = f.FullName.Insert(f.FullName.IndexOf(f.Extension), " (Aligned)");
StreamWriter w = new StreamWriter(outputFile);
foreach (string l in lines)
w.Write(l + "\r\n");
w.Close();
s.Close();
MessageBox.Show("Created the file: " + outputFile);
}
}
}