libunibreak 5.1
Functions
graphemebreak.c File Reference

Implementation of the grapheme breaking algorithm as described in Unicode Standard Annex 29. More...

#include <string.h>
#include "graphemebreak.h"
#include "graphemebreakdata.c"
#include "unibreakdef.h"
#include "emojidef.h"
Include dependency graph for graphemebreak.c:

Functions

void init_graphemebreak (void)
 Initializes the wordbreak internals. More...
 
static enum GraphemeBreakClass get_char_gb_class (utf32_t ch)
 Gets the grapheme breaking class of a character. More...
 
static void set_graphemebreaks (const void *s, size_t len, char *brks, get_next_char_t get_next_char)
 Sets the grapheme breaking information for a generic input string. More...
 
void set_graphemebreaks_utf8 (const utf8_t *s, size_t len, const char *lang, char *brks)
 Sets the grapheme breaking information for a UTF-8 input string. More...
 
void set_graphemebreaks_utf16 (const utf16_t *s, size_t len, const char *lang, char *brks)
 Sets the grapheme breaking information for a UTF-16 input string. More...
 
void set_graphemebreaks_utf32 (const utf32_t *s, size_t len, const char *lang, char *brks)
 Sets the grapheme breaking information for a UTF-32 input string. More...
 

Detailed Description

Implementation of the grapheme breaking algorithm as described in Unicode Standard Annex 29.

Author
Andreas Röver

Function Documentation

◆ get_char_gb_class()

static enum GraphemeBreakClass get_char_gb_class ( utf32_t  ch)
static

Gets the grapheme breaking class of a character.

Parameters
[in]chcharacter to check
Returns
the grapheme breaking class if found; GBP_Other otherwise

◆ init_graphemebreak()

void init_graphemebreak ( void  )

Initializes the wordbreak internals.

It currently does nothing, but it may in the future.

◆ set_graphemebreaks()

static void set_graphemebreaks ( const void *  s,
size_t  len,
char *  brks,
get_next_char_t  get_next_char 
)
static

Sets the grapheme breaking information for a generic input string.

It uses the extended grapheme cluster ruleset.

Parameters
[in]sinput string
[in]lenlength of the input
[out]brkspointer to the output breaking data, containing GRAPHEMEBREAK_BREAK or GRAPHEMEBREAK_NOBREAK
[in]get_next_charfunction to get the next UTF-32 character

◆ set_graphemebreaks_utf16()

void set_graphemebreaks_utf16 ( const utf16_t s,
size_t  len,
const char *  lang,
char *  brks 
)

Sets the grapheme breaking information for a UTF-16 input string.

Parameters
[in]sinput UTF-16 string
[in]lenlength of the input
[in]langlanguage of the input (reserved for future use)
[out]brkspointer to the output breaking data, containing GRAPHEMEBREAK_BREAK or GRAPHEMEBREAK_NOBREAK. First element in output array is for the break behind the first character the pointer must point to an array with at least as many elements as there are characters in the string

◆ set_graphemebreaks_utf32()

void set_graphemebreaks_utf32 ( const utf32_t s,
size_t  len,
const char *  lang,
char *  brks 
)

Sets the grapheme breaking information for a UTF-32 input string.

Parameters
[in]sinput UTF-32 string
[in]lenlength of the input
[in]langlanguage of the input (reserved for future use)
[out]brkspointer to the output breaking data, containing GRAPHEMEBREAK_BREAK or GRAPHEMEBREAK_NOBREAK. First element in output array is for the break behind the first character the pointer must point to an array with at least as many elements as there are characters in the string

◆ set_graphemebreaks_utf8()

void set_graphemebreaks_utf8 ( const utf8_t s,
size_t  len,
const char *  lang,
char *  brks 
)

Sets the grapheme breaking information for a UTF-8 input string.

Parameters
[in]sinput UTF-8 string
[in]lenlength of the input
[in]langlanguage of the input (reserved for future use)
[out]brkspointer to the output breaking data, containing GRAPHEMEBREAK_BREAK or GRAPHEMEBREAK_NOBREAK. First element in output array is for the break behind the first character the pointer must point to an array with at least as many elements as there are characters in the string