Banner

HPDF 1.1 : Introducing typesetting

Posted by alpheccar - Sep 14 2007 at 15:50 CEST

I have finally released HPDF 1.1 with some typesetting features. More details are in this post. It is very experimental but working. I am not happy at all with the API of HPDF but I have no choice. I need, for another project, to add features as fast as possible. I'll think about the elegance of the API later even if I need to change lots of things in HPDF (not really a good development methodology I agree). In addition to the typesetting features, I corrected lots of problems, optimized the code and changed a little bit the image API.

Typesetting a paragraph

Typesetting is a complex thing and the current implementation is very limited. You'll be able to use it to generate slides but not a book. There is no support for several pages (they have to be created manually. The typesetting code is assuming that the whole output is on the same page).

I have focused on the line breaking algorithm and on styles. Here is an example:

image

This example was created thanks to a ParagraphStyle.

Here is a part of the ParagraphStyle class (the full definition is in the Haddock documentation):

Haskell Code by HsColour
class ParagraphStyle a where
    lineWidth :: a -> PDFFloat -> Int -> PDFFloat
    linePosition :: a -> PDFFloat -> Int -> PDFFloat
    paraChange :: a -> [Letter] -> (a,[Letter])
    paragraphStyle :: a -> Maybe (Rectangle -> Draw b -> Draw ())

Let's see how this interface can be used to create the above example. When a paragraph monad is run, the text is transformed into a sequence of Letters. Here is a part of the Letter type:

Haskell Code by HsColour
data Letter  = Letter BoxDimension !AnyBox !(Maybe AnyStyle) 
             | Glue !PDFFloat !PDFFloat !PDFFloat !(Maybe AnyStyle)
             | AChar !AnyStyle !Char !PDFFloat

In this definition, a Letter is in fact any object that can be displayed. Perhaps I should have named it : generalized letter.

A sequence of letters is processed by the paraChange function. In the style used for the previous example, this function is removing the first letter of the paragraph (AChar), and is replacing it with a generalized Letter containing a colored bigger picture. The size of this new letter is remembered in a new version of the style. That's why paraChange is returning a new style in addition to a new sequence of letters.

Then, when the linebreaking algorithm is called, it is using the lineWidth and linePosition function to know the shape of the paragraph. The shape of the paragraph is dependent on the size of the letter recorded in the previous step.

There is a final trick : the new bigger letter should not change the interline space. So, the Box created by paraChange to contain the letter has null dimensions. Its only function is to display a big letter.

Finally, when the lines are displayed, the style function paragraphStyle is used. For that function, a paragraph is a sequence of lines with the same paragraph style. One argument of that function is the paragraph bounding rectangle. It is used to draw the red border and fill the paragraph background.

So, to display the previous example you finally just need to write:

Haskell Code by HsColour
setStyle BlueStyle
setParaStyle (BluePara 0)
paragraph $ do
    txt $ "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do"
    txt $ "eiusmod tempor incididunt ut labore et dolore magna aliqua. "
    txt $ "Ut enim ad minim veniam, quis nostrud exercitation ullamco "
    txt $ "laboris nisi ut aliquip ex ea commodo consequat. Duis aute "
    txt $ "irure dolor  in reprehenderit in voluptate velit esse cillum "
    txt $ "dolore eu fugiat nulla pariatur. Excepteur sint occaecat "
    txt $ "cupidatat non proident, sunt in cu"

The paragraph style is doing all the work.

Sentence and word styles

Here is a new example:

image

The method is the same. Instead of using a paragraph style, I am using sentence and word styles. The first style used in this example is a sentence style responsible for drawing a red rectangle around words. Note that for a sentence style, the unit of processing is the line. So, if a sentence is broken by the line breaking algorithm then it will be processed as several sentences (think about an URL).

After the red rectangle, the picture is containing an example of a word style. At the beginning of the style, a random generator is started and used to style the words. It means that the style is updated from word to word. It is not visible on this screenshot because there is a bug and the update was not occuring. I have corrected it before uploading the library to hackage ... and I hope this quick fix has not introduced other problems.

Finally, the last example is using a sentence style and a word style. The word style is styling the words and the glues in a different way. The sentence style is drawing a blue rectangle under the text, and a blue line over the text.

Note that the styling functions are receiving a Draw monad value as argument so they can potentially do much more like for instance rotating each word etc...

Paragraph shape

Another example:

image

This last example is using another paragraph style to fill a circle with text. Note that the display is stopping as soon as there is a line outside of the bounding rectangle bottom frontier.

Conclusion

It is perhaps too much work for a person alone during his spare time :-) In a next post, perhaps, I'll try to explain how allegories (an extension of category theory) are relevant to the problem of designing a line breaking algorithm and how I could improve my current algorithm.

You can find the lib on hackage

Once the lib is installed, go to the test folder, type make demo and then ./test. It should create the demo.pdf file.

It was tested with GHC only.

Tags | | |

Attachments

Comments

Add a comment...

Working on 1.2

Posted by alpheccar - Sep 24 2007 at21:58 CEST

I am working on a 1.2 version that will clean lots of things and add an API to build more complex documents like books with cross references. The style API is going to change a lot because the current one is ugly ...

Posted by alpheccar - Sep 14 2007 at23:24 CEST

Unfortunately, I have not yet had the time to really document the libray and I don't think the demo and the haddock documentation are enough.

So, don't hesitate to ask questions here if some things are not clear.

really nice stuff btw

Posted by Steven - Sep 14 2007 at23:18 CEST

Typesetting has always been one of those things I couldn't quite get. Hopefully I'll get around to playing with HPDF sometime soon :)

Posted by alpheccar - Sep 14 2007 at22:44 CEST

Yes. Glyph is probably a better choice.

Generalized letter

Posted by Steven - Sep 14 2007 at22:11 CEST

A glyph? Maybe that has a specific typesetting meaning....

Impressive !!

Posted by Jedaï - Sep 14 2007 at18:10 CEST

Your PDF library seems more and more interesting, good PDF generators aren't that easy to find and are often too cryptic or esoteric to be used without a good deal of investment. If Haskell can get such a library it would be a good point in it's favor for a lot of applications. Good work. :-)

-- Jedaï