Issue with string replacement on a lengthy string

Issue with string replacement on a lengthy string - string

I am facing an issue when trying to replace the newline character /n from a json string .
Here iam serializing my response data using JsonConvert and then trying to find and replace /n but its not finding any newline characters even if it has multiple.
The serialized text is having around 3k lines
var jsonResponse = JsonConvert.SerializeObject(response);
string formatted = string.Empty;
// Not working
if (jsonResponse.Contains(Environment.NewLine))
{
formatted = jsonResponse.Replace(Environment.NewLine, "");
}
But if i save this above jsonResponse to a .txt file and then read all text to a variable its working fine.Its finding the new line character and then replaces it.
var text = System.IO.File.ReadAllText(#"D:\TestData.txt");
// Working
if (text.Contains(Environment.NewLine))
{
formatted = text.Replace(Environment.NewLine, "");
}
How can i make this work with out loading from a text file.Please suggest

Related

Write json content in json format to a new file in Node js

Goal: To write the file content in json format using Node js. Upon opening the file manually, content should be displayed in json format
I tried both fs-extra module functions - outputJsonSync or writeFileSync to write json content to a file. They write the content inline as below
{"a":"1", "b":"2"}
However, I would like to see the content as below when I open the file manually:
{
"a" : "1",
"b" : "2"
}
I tried jsome and pretty-data on the data as follows:
fs.outputJsonSync(jsome(data))
fs.outputJsonSync(pd.json(data))
They also write data inline only with extra \ or \n and tabs added to the data but doesn't open in formatted style.
Any inputs are highly appreciated. Thanks!
[Update]
Other scenario:
const obj = {"a":"1", "b":"2"}
var string = "abc" + "splitIt" + obj
doSomething(string)
And inside the function implementation:
doSomething(string){
var arr = string.split("splitIt")
var stringToWrite = JSON.stringify(arr[1], null, ' ').replace(/: "(?:[^"]+|\\")*"$/, ' $&')
fs.writeFileSync(filePath, stringToWrite)
}
Following output is displayed when I open the file:
"[object Object]"

Once you have the object, you can specify a replacer function to separate each key-value pair by a newline, then use a regular expression to trim the leading spaces, then use another regular expression to insert a space before the : of a key-value pair. Then, just write the formatted string to a file:
const obj = {"a":"1", "b":"2"};
const stringToWrite = JSON.stringify(obj, null, ' ')
// Trim leading spaces:
.replace(/^ +/gm, '')
// Add a space after every key, before the `:`:
.replace(/: "(?:[^"]+|\\")*",?$/gm, ' $&');
console.log(stringToWrite);
Though, you may find the leading spaces more readable:
const obj = {"a":"1", "b":"2"};
const stringToWrite = JSON.stringify(obj, null, ' ')
// Add a space after every key, before the `:`:
.replace(/: "(?:[^"]+|\\")*",?$/gm, ' $&');
console.log(stringToWrite);

JSON.stringify, has two optional parameters, the first one being a replacer function, the second one(What you want) is for spacing.
const obj = {"a":"1", "b":"2"}
console.log(JSON.stringify(obj, null, 2))
This will give you:
{
"a": "1",
"b": "2"
}

Seperated text line in Apache POI XWPFRun object

I 'm trying to replace a template DOCX document with Apache POI by using the XWPFDocument class. I have tags in the doc and a JSON file to read the replacement data. My problem is that a text line seems separated in a certain way in DOCX when I change its extension to ZIP file and open document.xml. For example [MEMBER_CONTACT_INFO] text becomes [MEMBER_CONTACT_INFO and ] separately. POI reads this in the same way since the DOCX original is like this. This creates 2 XWPFRun objects in the paragraph which show the text as [MEMBER_CONTACT_INFO and ] separately.
My question is, is there a way to force POI to run like Word via merging related runs or something like that? Or how can I solve this problem? I 'm matching run texts while replacing and I can't find my tag because it is split into 2 different run object.
Best

This wasted so much of my time once...
Basically, an XWPFParagraph is composed of multiple XWPFRuns, and XWPFRun is a contagious text that has a fixed same style.
So when you try writing something like "[PLACEHOLDER_NAME]" in MS-Word it will create a single XWPFRun. But if you somehow add a few things more, and then you go back and change "[PLACEHOLDER_NAME]" to something else it is never guaranteed that it will remain a single XWPFRun it is quite possible that it will split to two Runs. AFAIK this is how MS-Word works.
How to avoid splitting of Runs in such cases?
Solution: There are two solutions that I know of:
Copy text "[PLACEHOLDER_NAME]" to Notepad or something. Make your necessary modification and copy it back and paste it instead of "[PLACEHOLDER_NAME]" in your word file, this way your whole "[PLACEHOLDER_NAME]" will be replaced with new text avoiding splitting of XWPFRuns.
Select "[PLACEHOLDER_NAME]" and then click of MS-Word "Replace" option and Replace with "[Your-new-edited-placeholder]" and this will guarantee that your new placeholder will consume a single XWPFRun.
If you have to change your new placeholder again, follow step 1 or 2.

Here is the java code to fix that separate text line issue. It will also handle the mult-format string replacement.
public static void replaceString(XWPFDocument doc, String search, String replace) throws Exception{
for (XWPFParagraph p : doc.getParagraphs()) {
List<XWPFRun> runs = p.getRuns();
List<Integer> group = new ArrayList<Integer>();
if (runs != null) {
String groupText = search;
for (int i=0 ; i<runs.size(); i++) {
XWPFRun r = runs.get(i);
String text = r.getText(0);
if (text != null)
if(text.contains(search)) {
String safeToUseInReplaceAllString = Pattern.quote(search);
text = text.replaceAll(safeToUseInReplaceAllString, replace);
r.setText(text, 0);
}
else if(groupText.startsWith(text)){
group.add(i);
groupText = groupText.substring(text.length());
if(groupText.isEmpty()){
runs.get(group.get(0)).setText(replace, 0);
for(int j = 1; j<group.size(); j++){
p.removeRun(group.get(j));
}
group.clear();
groupText = search;
}
}else{
group.clear();
groupText = search;
}
}
}
}
for (XWPFTable tbl : doc.getTables()) {
for (XWPFTableRow row : tbl.getRows()) {
for (XWPFTableCell cell : row.getTableCells()) {
for (XWPFParagraph p : cell.getParagraphs()) {
for (XWPFRun r : p.getRuns()) {
String text = r.getText(0);
if (text.contains(search)) {
String safeToUseInReplaceAllString = Pattern.quote(search);
text = text.replaceAll(safeToUseInReplaceAllString, replace);
r.setText(text);
}
}
}
}
}
}
}

For me it didn't work as I expected (every time). In my case I used "${PLACEHOLDER} in the text. At first we need to take a look how Apache Poi recognize each Paragraph which we want to iterate through with Runs. If you go deeper with docx file construction you will know that one run is a sequence of characters of text with the same font style/font size/colour/bold/italic etc. That way placeholder sometimes was divided into parts OR sometimes whole paragraph was recognized as a one Run and it was impossible to iterate through words. What I did is to bold placeholder name in a template document. Than when iterating through RUN I was able to iterate through whole placeholder name ${PLACEHOLDER}. When I replaced that value with
for (XWPFRun r : p.getRuns()) {
String text = r.getText(0);
if (text != null && text.contains("originalText")) {
text = text.replace("originalText", "newText");
r.setText(text,0);
}
}
I've added just r.isBold(false); after setText.
That way placeholder is recognized as a different run -> I'm able to replace specific placeholder, and in the processed document I have no bolding, just a plain text. For me one of a additional advantage was that visualy I'm able to faster find placeholders in text.
So finally above loop looks like that:
for (XWPFRun r : p.getRuns()) {
String text = r.getText(0);
if (text != null && text.contains("originalText")) {
text = text.replace("originalText", "newText");
r.setText(text,0);
r.isBold(false);
}
}
I hope it will help to someone, while I spend too much time for that :)

I also had this issue few days ago and I couldn't find any solution. I chose to use PLACEHOLDER_NAME instead of [PLACEHOLDER_NAME]. This is working fine for me and it's seen like a single XWPFRun object.

To be sure that a word will be consider as a single XWPFRun,
You can use merge_field as variable in word like that
Place cursor on the word you want to be a single run.
Press CTRL and F9 together and { } in gray will appear.
Right-click on the { } field and select Edit Field.
In pop-up box, select Mail Merge from Categories and then MergeField from Field Names.
Click OK.

Selenium IDE: how to remove \n from stored string?

I am trying to extract the zip code from an address string but it contains newline \n characters. May I ask how to remove it from a selenium stored var? I have tried to use storeEval | "${Addrsss}".replace("\n", "") | Address. But, selenium ide will return the error Threw an exception: unterminated string literal
Here is the address:
${Address} = "100 RILEY DR\n AVONDALE,\n ARIZONA\n 85323-2004"

Try this sequence of escape>replace>unescape as a workaround to remove the new line character:
Escape the value,
Replace the escaped new line character (%0A) with blank (''),
Unescape back to the original value,
storeEval | unescape(escape(storedVars['has_nl']).replace(/%0A/g,'')) | no_nl
This new line character appears to have come from HTML break tag (<br />) that is rendered by the browser-Selenium-IDE combination, then extracted by Selenium IDE as new line character (\\n).

Possible approach:
1) find proper css (or xPath) locator of the element (address)
2) then get contents ( text) from element using
String cssSelecotr=..blablabla..
//1st way
String myAddress=driver.findELement(by.cssSelector(cssSelector)).getText();
//2nd way, using js executor
JavascriptExecutor js = (JavascriptExecutor) driver;
StringBuilder stringBuilder = new StringBuilder();
stringBuilder.append("var x = $(\""+cssSelecotr+"\");");
stringBuilder.append("return x.text().toString();") ;
String myAddress= (String) js.executeScript(stringBuilder.toString());
3) then you can apply regExp ( all visible symbols) for yourAdress
// myAddress ="100 RILEY DR\n AVONDALE,\n ARIZONA\n 85323-2004";
String myAdressEdited = myAddress.replaceAll("[^\\x20-\\x7E]+","");
Hope this helps

CSV field with newline character in a cell to import to excel

I've got a problem with data in CSV file to import into Excel.
I've parsed data from a website, and contain line break <br>, I convert this tag into "\n" and write to a CSV file. However, when I import this CSV file into Excel, the line-break display incorrectly. It results new line as a new row instead of a new line in a single cell itself.
Anyone have face this problem before? Really appreciate your suggestion.
Thanks!
Edit: Here the sample to demonstrate my situation
static void TestLine()
{
string sampleData = "日찬양 까페에 올린 충격적인 <br>글코리아타임스";
string formattedData = sampleData.Replace("<br>", "\n");
using (StreamWriter writer = new StreamWriter(#"C:\SampleData.csv", false, Encoding.Unicode))
{
writer.WriteLine(formattedData);
}
}
I want the sampleData displaying in a cell, however, the result happens in 2 cells.

It looks to me like misformated CSV file. To handle this situation correctly, you will need to have a field with line break contained within quotation marks.
UPDATE after the sample:
This works fine:
static void Main(string[] args)
{
string sampleData = "\"日찬양 까페에 올린 충격적인 <br>글코리아타임스\"";
string formattedData = sampleData.Replace("<br>", "\n");
using (StreamWriter writer = new StreamWriter(#"C:\SampleData.csv", false, Encoding.Unicode))
{
writer.WriteLine(formattedData);
}
}
You need to wrap the field in the quotation marks (")

Please take a look at CSV-1203 File Format Specification, in particular the sections on "End-of-Record Marker" and "Field Payload Protection". Hopefully this should give clear guidance on the inner workings of the CSV file format.

c# uploading data error -> return "�" for space

i am using c# with http helper and using stream reader to read a text. But When i upload a text file containing this text
"Look  exactly what I found on # eBay! Willy Lee LifeLike  Chatting Butler Prop Motion Sen"
the space is replced by "�" and used in the code.
Code for reading the text is:-
List<string> list = new List<string>();
StreamReader reader = new StreamReader(filepath);
string text = "";
while ((text = reader.ReadLine()) != null)
{
if (!string.IsNullOrEmpty(text))
{
list.Add(text);
}
}
reader.Close();
return list;
list contains this data-
"Look��exactly�what�I�found�on�#�eBay!�Willy�Lee�LifeLike��Chatting�Butler�Prop�Motion�Sen"

Looks like encoding problem - I have had such text problems, when a text is multibyte encoded and shown in a non-unicode based webpage like a Windows-1252 or CP-125X or such.
Here looks like the same - text looks UTF-8 encoded and is displayed in ansi mode, so here the spaces are "special" spaces like these M$ Word puts sometimes, and the english characters are single byte as is the UTF-8 format (forr all chars below ASCII code 128) and this means they are compatible with ANSI codetable and visible correctly.
Or option 2 if it written in a file, and this text is saved like that, witout BOM in the beginning, the text editor may not understand that the context is unicode and opens it in ansi /regular ascii mode/.
If you give more details from where the data is read and where is saved and opened, I can give more concrete details.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Issue with string replacement on a lengthy string - string

Related

Write json content in json format to a new file in Node js

Seperated text line in Apache POI XWPFRun object

Selenium IDE: how to remove \n from stored string?

CSV field with newline character in a cell to import to excel

c# uploading data error -> return "�" for space

Categories

Resources