View on GitHub

merganser

Merganser is a scalable and extendable tool for analyzing merge scenarios in git repositories

Merganser

License: MIT Build Status

This repository provides a toolchain for gathering and analyzing merge scenarios found in git repositories. The tool stores the collected data in a normalized MySQL database. It supports extracting various features of the merge scenarios as well as executing the merge using different merge tools, detecting merge conflicts, and finding compilation and test problems with the merge resolution.

The toolchain has been tested with Ubuntu 20.04.

Setup

  1. Clone this repository:
    git https://github.com/ualberta-smr/merganser
    
  2. Install the dependencies.
    pip3 install -r requirements.txt
    

Usage

  1. Set the config.py file: The pre-defined paths, database information, constants, and access keys are stored in config.py file. The full description of these parameters is in the wiki page. The only parameters that the user must set before using Merganser are the GitHub access keys and database parameters.

  2. Add the list of repositories: The input of the main program is a list of repositories to analyze. There are different ways to create such list:

    • Add the repository list manually: If you already have the list of repositories to analyze, write them in a *.txt file (each repository per line) and copy the text file in ./working_dir/repository_list (this path is REPOSITORY_LIST_PATH which is set in config.py).

    • Automatic searching: If you do not have specific repositories in mind, but instead, want to analyze repositories with a specific range of stars, watches, forks, size, or that are in a specific application domain, you can search the list of repositories using search_repository.py. Read the wiki page to find out the parameters of this module.

  3. There are two ways to run the tool based on the final goal. the results are stores in CSV files.
    • Execute the tool to extract all available data:
     python3 ./run_predict.sh <list_of_repositories>
    
    • Execute the tool for conflict prediction data:
     python3 ./run_all.sh <list_of_repositories>
    
  4. The next step is storing the the CSV files in a SQL database.
python3 ./data_conversion.py
  1. For conflict prediction, first create the data:
python3 ./data_prediction.py

Conflict Prediction

The wiki page describes all possible parameters.

License

Merganser is released under the MIT License.

Support

Feel free to report any issue about Merganser here. You can ask your question about installing and running the tool from the creators Moein Owhadi Kareshk and Sarah Nadi.

Contribution

You are very welcome to post a pull-request should you have change, bug fix, etc. in mind.