[Solved] Problem with Unwrap script

English main discussion
Post Reply
  • Author
  • Message
Offline
Posts: 23
Joined: Sun Nov 13, 2011 5:40 pm

[Solved] Problem with Unwrap script

Post by foliator »

I use the script Unwrap.js very often for stripping out unwanted line breaks in order to reflow paragraphs, but it frequently leaves line breaks in the middle of some paragraphs. Using the special characters plugin I can see the CR/LFs and eliminate them, but in a long document that takes a lot of time. Does anyone know what would cause the problem, and is there a way I can solve it without debugging the script? I have no experience working with JavaScript.

BTW, this problem has never disappeared after updating AkelPad.
Last edited by foliator on Wed Oct 09, 2013 2:38 pm, edited 1 time in total.

Offline
Site Admin
Posts: 6311
Joined: Thu Jul 06, 2006 7:20 am

Post by Instructor »

foliator
Unwrap.js? Probably this is old name of LinesUnwrap.js. If yes, can you write with what concrete text you have problems.

Offline
Posts: 23
Joined: Sun Nov 13, 2011 5:40 pm

Post by foliator »

Yes, it's the same script; I think I must have renamed it. Here's a small example. Below is a single paragraph from a Gutenberg text file of Charles Dickens' Tale of Two Cities. I've used the script on it while posting this message and pasted in the results. Hopefully the forum's input form won't change it when I post this.

1) The original paragraph, with its CR/LF at the end of each line:

All these things, and a thousand like them, came to pass in and close
upon the dear old year one thousand seven hundred and seventy-five.
Environed by them, while the Woodman and the Farmer worked unheeded,
those two of the large jaws, and those other two of the plain and the
fair faces, trod with stir enough, and carried their divine rights
with a high hand. Thus did the year one thousand seven hundred
and seventy-five conduct their Greatnesses, and myriads of small
creatures--the creatures of this chronicle among the rest--along the
roads that lay before them.

2) The paragraph after using the script. As you can see, it's only partly unwrapped:

All these things, and a thousand like them, came to pass in and close upon the dear old year one thousand seven hundred and seventy-five.
Environed by them, while the Woodman and the Farmer worked unheeded, those two of the large jaws, and those other two of the plain and the fair faces, trod with stir enough, and carried their divine rights with a high hand. Thus did the year one thousand seven hundred and seventy-five conduct their Greatnesses, and myriads of small creatures--the creatures of this chronicle among the rest--along the roads that lay before them.

Offline
Posts: 582
Joined: Mon Apr 08, 2013 9:50 pm
Location: Win7SP1x64, APx64

Post by Drugmix »

I think the script doesn't transform cr/lf after a dot into a dot and a space after it.

If it would - then all the text would be transformed into a single paragraph.
The problem is: how to distinguish the end of a paragraph from a the end of a sentence in the middle of a paragraph?

The only idea is to transform the initial text so that paragraphs are separated with TWO new line characters.

Offline
Posts: 23
Joined: Sun Nov 13, 2011 5:40 pm

Post by foliator »

Drugmix wrote:I think the script doesn't transform cr/lf after a dot into a dot and a space after it.

If it would - then all the text would be transformed into a single paragraph.
The problem is: how to distinguish the end of a paragraph from a the end of a sentence in the middle of a paragraph?

The only idea is to transform the initial text so that paragraphs are separated with TWO new line characters.
Actually, the rest of the paragraphs in that file are already separated that way, just the way the paragraphs look in this post. If I highlight several paragraphs in the file, they unwrap separately too, but the task is not fully completed. As an experiment I took several paragraphs and added yet another new line character to the end of each one. I still had the incorrect result. The paragraphs were properly separated, but not fully wrapped.

Naturally it would be unwise to use a script like this on an entire document in one operation, anyway, because there may be some lists or poems that have very short lines; they would wind up as paragraphs. Been there, done that -- undid that. :lol:

Offline
Site Admin
Posts: 6311
Joined: Thu Jul 06, 2006 7:20 am

Post by Instructor »

foliator
Remove ".?!:;" from paragraph's delimiters:

Code: Select all

  //Unwrap lines
  //pSelText=pSelText.replace(/([^.?!:;\n])\n[ \t]*([^\n])/g, "$1 $2");
  pSelText=pSelText.replace(/([^\n])\n[ \t]*([^\n])/g, "$1 $2");

Offline
Posts: 23
Joined: Sun Nov 13, 2011 5:40 pm

[Solved] Problem with Unwrap script

Post by foliator »

Instructor wrote:foliator
Remove ".?!:;" from paragraph's delimiters:

Code: Select all

  //Unwrap lines
  //pSelText=pSelText.replace(/([^.?!:;\n])\n[ \t]*([^\n])/g, "$1 $2");
  pSelText=pSelText.replace(/([^\n])\n[ \t]*([^\n])/g, "$1 $2");
Thank you, Instructor, that fixed it. The script now works perfectly, even on a large portion of text with lots of paragraphs! :D
Post Reply