Fork me on GitHub

Collocation Extraction by dainiusjocas

Troubles faced

Implementation with Ruby

From the first look the easiest solution seemed to be writing Ruby script to get required results. But after having half of the project done, Ruby showed unacceptable performance working with large XML files. Brown Corpus which is 27 MB flushed 4 GB of RAM and script usually stopped responding. Problem was solved by choosing Java provided XML parsing engine which performed reasonably good.

Validation of Results

Ifound it difficult to validate results whether their are reliable or not.

License

Creative Commons Attribution 3.0 License

Authors

dj (dainiusjocas@gmail.com)

Contact

Dainius Jocas (dainiusjocas@gmail.com)

Download

You can download this project in either zip or tar formats.

You can also clone the project with Git by running:

$ git clone git://github.com/dainiusjocas/Computational-Linguistics