Data

Downloading the Data

To download the data, run the following command at the root level of this project

mkdir data/
curl --location --request GET 'http://virulent.cs.umd.edu:3000/sessions' > data/sessions.json

Understanding Stored Data

We record VSCode event data into JSON files. The meaning of each field is not trival to understand, so we describe them below. We record two types of events, mouse movements/hightlights and keyboard events. The following comments are extracted from the VSCode API: https://code.visualstudio.com/api/references/vscode-api

Mouse movements and highlights

{
    // The position of the cursor.
    "active": {
        "line": 0,
        "character": 1
    },
    // The position at which the selection starts (equal to either start or end position)
    "anchor": {
        "line": 0,
        "character": 1
    },
    // The end position of the selection.
    "end": {
        "line": 0,
        "character": 1
    },
    // A selection is reversed if its anchor is the end position.
    "isReversed": true,
    // The start position of the selection.
    "start": {
        "line": 0,
        "character": 1
    }
}

Keyboard Events

{
    "startLine": 0,
    "startChar": 1,
    "endLine": 0,
    "endChar": 1,
    "textChange": "H",
    "testsPassed": [],
    "time": 1666896719657
},

Loading data

proof_data_analysis.utils.get_num_tests_passed(tests_passed: Series) Series

Convert a series of tests passed to a series of numbers

Each resulting datapoint is just the number of tests passed

e.g. [[1,2], [3,4], [1,2,3]] -> [2, 2, 3]

proof_data_analysis.utils.load_df(path_to_json: str = 'example.json') DataFrame

Load the json file containing the keylogged events and convert it to a pandas dataframe.

There are 3 events listed here, insert, replace, and delete.

Parameters

path_to_json – path to the json file with the keylogged events

Returns

a pandas dataframe with a row for each keylogged event

proof_data_analysis.utils.times_to_seconds(time: Series) Series

Convert a series of time stamps to seconds

Each resulting datapoint is just the amount of seconds from the first time stamp