Summaries Annotation Platform

Welcome to the code summary annotation platform!

These summaries are either written by a domain expert or generated by a pre-trained LLM. Some LLM-generated summaries follow our approach, which considers the context of the method being analyzed.

During your session, you will annotate these summaries without knowing whether they are human- or machine-generated.

Please evaluate the summary based on two essential criteria: accuracy with respect to the source code, and completeness of the conveyed information. For each criterion, assign a score on a scale of 1 to 4 (4 being the highest).

  • Precision (Consistency) :
    • Very Poor: Contains hallucinatory facts, unusable summary.
    • Poor: Poorly explains the code, missing important elements.
    • Good: The summary matches the source code. Existing errors are tolerable.
    • Excellent: Better summary of the source code.
  • Recall (Relevance) :
    • Strongly Disagree: Misses all important information about the source code.
    • Disagree: Misses some majors information about the code.
    • Agree: Contains import information.
    • Strongly Agree: Constains all important information.