Fork me on GitHub
Application
Sample Corpus

Collocation Extraction by dainiusjocas

How to use application

  1. How to download the application?

    At the top of the page there are two icons. Click on the left icon and the download will start.

  2. From where download data to execute on?

    At the top of the page there are two icons. Click on the right icon with right button and choose "Save Link As". NOTE: this is just sample corpus. To download full Brown corpus go to http://nltk.googlecode.com/svn/trunk/nltk_data/index.xml.

  3. What else do I need to run the application?

    To run application You need to have Java VM installed on Your computer and [JAVA_HOME]/bin on PATH. Also, I advice to use some command line tool.

  4. How to run application?

    Navigate to the directory where You have downloaded "Collocations.jar" file. Type into shell line which is formated like this:

    java -jar Collocations.jar [data_file][output_file][mode][options]
    Where:
    • "data_file" is file where corpus is stored (e.g. the one you downloaded at second step).
    • "output_file" is the file where you will get the results of computations.
    • "mode" can be "-f" - frequency count, or "-c" - chi-square computation.
    • "options" for now you can give this parameter only as number which tells to application how many lines to write to output file.

  5. How to read output file?

    Output file is formated in that way: every line has two parts divided by " " symbol, where:

    • First part is bigram formated in that way: "[head]/[TYPE]_[tail]/[type]".
    • Second part is number showing either how many times the collocation occurred in the corpus or chi-square value of bigram. It depend on which parameter "-f" or "-c" you gave to the application.

Go to main page