Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

code2vec

an implementation of "code2vec: Learning Distributed Representations of Code"

Requirements

  • python 3.6+
  • pytorch 1.1+
  • scikit-learn
  • tensorboardX (optional)

Usage

train with "dataset"

python main.py --lr 0.01 --corpus_path ./dataset/corpus.txt --path_idx_path ./dataset/path_idxs.txt --terminal_idx_path ./dataset/terminal_idxs.txt --model_path ./output --vectors_path ./output/code.vec --terminal_embed_size 100 --path_embed_size 100 --encode_size 100 --max_epoch 40 --random_seed 1 --dropout_prob 0.25

train with large "top11_dataset"

concatenate dataset:

cat ./top11_dataset/splitted_corpus.* > ./top11_dataset/corpus.txt

train the model:

python main.py --batch_size 1024 --lr 0.01 --corpus_path ./top11_dataset/corpus.txt --path_idx_path ./top11_dataset/path_idxs.txt --terminal_idx_path ./top11_dataset/terminal_idxs.txt --model_path ./output --vectors_path ./output/code.vec --terminal_embed_size 100 --path_embed_size 100 --encode_size 100 --max_epoch 20 --random_seed 1 --dropout_prob 0.25

License

MIT License

About

an implementation of "code2vec: Learning Distributed Representations of Code"

Topics

Resources

License

Packages

No packages published
You can’t perform that action at this time.