Discussion:
[Apertium-stuff] separable words module -- call for requests
Francis Tyers
2017-07-10 09:20:27 UTC
Permalink
Hello everyone!

We're making progress on a module for treating separable "multiword"
expressions. The general idea is to be able to do stuff like

^take$ ^the$ ^rubbish$ ^out$ -> ^take# out$ ^the$ ^rubbish$ -> ^sacar$
^la$ ^basura$
^be$ ^always$ ^late$ -> ^be# late$ ^always$ -> ^llegar# tarde$
^siempre$
^take$ ^the$ ^rubbish$ ^out of$ ^here$ -> ^take# out$ ^the$ ^rubbish$
^of$ ^here$ -> ^sacar$ ^la$ ^basura$ ^de$ ^aquí$

The general idea is that it will be a finite-state transducer (like the
existing monolingual and bilingual dictionaries) but that can work over
words. It will appear between the pretransfer module and the lexical
transfer module (apertium-pretransfer | new module | lt-proc -b).

So, this email is a call for language pair developers to give us
examples of phenomena you would like to treat in your language pair.

Thanks!

Fran
Benedikt Freisen
2017-07-10 17:17:56 UTC
Permalink
[Copy of the reply that I accidentally sent to Francis, only.]

Well, for German we would need something that correctly handles the
following:

-- Block-quote from "The Awful German Language" by Mark Twain --

The Germans have another kind of parenthesis, which they make by
splitting a verb in two and putting half of it at the beginning of an
exciting chapter and the other half at the end of it. Can any one
conceive of anything more confusing than that? These things are called
"separable verbs." The German grammar is blistered all over with
separable verbs; and the wider the two portions of one of them are
spread apart, the better the author of the crime is pleased with his
performance. A favorite one is reiste ab -- which means departed. Here
is an example which I culled from a novel and reduced to English:

"The trunks being now ready, he DE- after kissing his mother and
sisters, and once more pressing to his bosom his adored Gretchen, who,
dressed in simple white muslin, with a single tuberose in the ample
folds of her rich brown hair, had tottered feebly down the stairs, still
pale from the terror and excitement of the past evening, but longing to
lay her poor aching head yet once again upon the breast of him whom she
loved more dearly than life itself, PARTED."

-- end of quote --

Please note that it is
"Sie reisen ab." (= "They depart.")
but
", wenn sie abreisen." (= ", when they depart.")

Greetings
Benedikt
Post by Francis Tyers
Hello everyone!
We're making progress on a module for treating separable "multiword"
expressions. The general idea is to be able to do stuff like
^take$ ^the$ ^rubbish$ ^out$ -> ^take# out$ ^the$ ^rubbish$ -> ^sacar$
^la$ ^basura$
^be$ ^always$ ^late$ -> ^be# late$ ^always$ -> ^llegar# tarde$ ^siempre$
^take$ ^the$ ^rubbish$ ^out of$ ^here$ -> ^take# out$ ^the$ ^rubbish$
^of$ ^here$ -> ^sacar$ ^la$ ^basura$ ^de$ ^aquí$
The general idea is that it will be a finite-state transducer (like the
existing monolingual and bilingual dictionaries) but that can work over
words. It will appear between the pretransfer module and the lexical
transfer module (apertium-pretransfer | new module | lt-proc -b).
So, this email is a call for language pair developers to give us
examples of phenomena you would like to treat in your language pair.
Thanks!
Fran
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Apertium-stuff mailing list
https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Flammie Pirinen
2017-07-10 21:47:07 UTC
Permalink
Post by Benedikt Freisen
Well, for German we would need something that correctly handles the
[...]
Post by Benedikt Freisen
Please note that it is
"Sie reisen ab." (= "They depart.")
but
", wenn sie abreisen." (= ", when they depart.")
I’ve sent a German test set[1] based on my experiences with
apertium-fin-deu, but it would be a good thing if a native speaker
could search and annotate a corpus of real-world examples. It should be
noted though that the case like abocve, where there are no intervening
words between the prefix and the verb is already handled fine.
--
Flammie, computer scientist bachelor + linguist master = computational
linguist doctor, free software Finnish localiser,
and more! <http://www.iki.fi/flammie/>
Francis Tyers
2017-07-11 08:09:17 UTC
Permalink
Post by Flammie Pirinen
Post by Benedikt Freisen
Well, for German we would need something that correctly handles the
[...]
Post by Benedikt Freisen
Please note that it is
"Sie reisen ab." (= "They depart.")
but
", wenn sie abreisen." (= ", when they depart.")
I’ve sent a German test set[1] based on my experiences with
apertium-fin-deu, but it would be a good thing if a native speaker
could search and annotate a corpus of real-world examples. It should be
noted though that the case like abocve, where there are no intervening
words between the prefix and the verb is already handled fine.
Cases that currently work are also good to include. :)

F.
Francis Tyers
2017-07-11 08:25:20 UTC
Permalink
Post by Benedikt Freisen
[Copy of the reply that I accidentally sent to Francis, only.]
Well, for German we would need something that correctly handles the
[snip]
Post by Benedikt Freisen
Please note that it is
"Sie reisen ab." (= "They depart.")
but
", wenn sie abreisen." (= ", when they depart.")
This is unlikely to happen unless Apertium moves from being a primarily
finite-state-based platform. The best we can do is allow a fixed (by
you) number of patterns between the verb and the separable part.

Fran

Loading...