Fork me on GitHub

Collocation Extraction by dainiusjocas

Preject Title: Collocation Extraction From Text Corpora

This simple Java application helps you to extract information about bigrams from corpus files which are under BrownXML.xsd XML schema

Instruction how to use this application are here?

Very Quick Tutorial

You should run Collocation.jar archive by typing into shell:

java -jar Collocations.jar Corpus.xml results.txt -f 1000

Quick Links

  1. 1000 most frequen bigrams from Brown Corpus
  2. 1000 bigrams from Brown Corpus with highest chi-square values
  3. Statistics
  4. Instructions for the project
  5. Troubles

All results are extracted from full brown corpus.

License

Creative Commons Attribution 3.0 License

Authors

dj (dainiusjocas@gmail.com)

Contact

Dainius Jocas (dainiusjocas@gmail.com)

Download

You can download this project in either zip or tar formats.

You can also clone the project with Git by running:

$ git clone git://github.com/dainiusjocas/Computational-Linguistics