RFTagger Java Interface
Introduction
RFTagger Java Interface is a flavour of Helmut Schmid's and Florian Laws' RFTagger that runs as a library and application in Java. It uses their original code as a binary library which is loaded automagically. It features many small but nice improvements and works out of the box on all major three platforms.
RFTagger Java Interface is brought to you by Ramon Ziai and Niels Ott, funded by SFB 833, Project A4.
Features
RFTagger Java Interface can do everything that the original version can do. And it can do even more. Here's the list of improvements:
- Refuses to load incompatible parameter files files.
- Auto-select of 32bit vs 64bit parameter files (library only)
- Handles the encoding of parameter files and therefore works flawlessly on UTF8-based systems
- Can additionally output STTS tags (only German STTS, for other languages, we need further development)
- Runs on Linux, OS X, and Windows, both 32bit and 64bit, out of the box.
- Is available as an easy-to-use Java library as well as a standalone tool.
Using it on the Command Line
Type java -jar rft-java.jar and there you go. You'll need the parameter
files that come with the original
RFTagger
Using it as a Library
Basic usage:
// initializing parameter file (aka: auto-select the right model)
Model
model = new Model(new File("/home/rftj/german-pc-32bit.par"), new
File("/home/rftj/german-pc-64bit.par"));
// initialize the tagger with
defaults
RFTagger rft = new RFTagger(model);
// do some tagging
String[] words = new String[]{"Das", "ist", "ein", "Test", "."};
List<String> tags = rft.getTags(Arrays.asList(words));
Using the tagset converter:
TagsetConverter conv = ConverterFactory.getConverter("stts");
List<String> sttsTags = new LinkedList<String>();
for ( String tag
: tags ) {
sttsTags.add(conv.rftag2tag(tag));
}
Legal Stuff
RFTagger Java Interface ships with third-party code that has several license agreements.
- RFTagger Copyright: Helmut Schmid and Florian Laws. RFTagger is freely available for education, research and other non-commercial purposes.
- Apache Commons CLI by Apache Software Foundation, Apache License Version 2.0
- Java Native Access (JNA) GNU Lesser General Public License
- RFTagger Java Interface Copyright: Ramon Ziai and Niels Ott, License CC BY-NC-SA
TODO
- Beta-testing
- Test STTS mapping
- Develop other tagset mappings or accept contributions
- Once beta status has been overcome, release source code
Known Issues
- Windows version works only in MinGW shell for now. We need to compile a static library but we are not Windows experts.
Requirements
- Java 6 (comes with any reasonably OS anyways)
Download
Beta 8, aka 0.0.8, released 2012-02-09:
Other stuff: