Andres Soto Villaverde

Data scientist NLP Python

Elche, Spain


Experience in Artificial Intelligence (Data Scientist, Machine Learning, Natural Language Processing, Sentiment Analysis, Recommender Systems), Python (scikit-learn, NLTK, SciPy, NumPy, Matplotlib, NetworkX, pandas, etc.), MATLAB, MongoDB, Apache Lucene/Solr, MySQL
PhD in Artificial Intelligence. Graduated in Numerical Analysis.


English, Spanish

Favorite Python Packages:

scikit-learn, NLTK, SciPy, NumPy, Matplotlib, NetworkX, pandas


Artificial Intelligence Software Engineer, Prisma Analytics, Oct. 2018 – Apr. 2019

Project: applications for Natural Language Processing

Technologies: Python, Conditional Random Fields (CRF), NLTK, JSON, sklearn_crfsuite, scikit-klearn, NetworkX, MongoDB, Stanford CoreNLP Dependency Parser, CoNLL-U, Universal Dependencies, FrameNet, PropBank, VerbNet,

Consultant for Machine Learning methods (especially “Natural Language Processing”), Neusta consulting GmbH, Germany, Sep-2018 – Feb. 2019

Project: applications for Natural Language Processing

Technologies: Python, Conditional Random Fields (CRF), NLTK, json, sklearn_crfsuite, scikit-klearn, Stanford CoreNLP Dependency Parser

Researcher + Developer + Project Leader, Fundamentia Business Consulting, Dec. 2017 – Oct. 2018

Project: applications for Natural Language Processing

Technologies: Python, scikit-klearn (LinearSVC, SVM, SGDClassifier, MultinomialNB, Pipeline, GridSearchCV), LibSVM, Java, JSON, MongoDB, Linux.

Researcher + Developer + Artificial Intelligence Leader, Sukan Sport Technology S.L., Mar. 2017 – Nov. 2017

Project: applications for sport training.

Technologies: Python, Behavior tree, Finite State Machines

Data Scientist freelancer, Feb. 2017 – May. 2017

Project: expert system for psychology.

Technologies: Expert Systems, Fuzzy Logic, Topic Maps (Ontopia, Omnigator), Python (scikit-fuzzy, Pyke, Tkinter), SCI Prolog,

Researcher + Developer, Universidad de Castilla La Mancha, Ciudad Real, Nov. 2016 – Feb. 2017

Project: Intelligent Tutoring Systems (ITS) for Robot Programming

Technologies: Intelligent Tutoring Systems, Reinforcement Learning, Multi-Armed Bandit (MAB) problem, Bayesian Knowledge Tracing (BKT), Node.js, REST API, Postman, JavaScript, Mongoose, MongoDB, Trello, GitHub

Data Scientist freelancer, Jun. 2015 – Nov. 2016

Project: Headlines classifier for a localization system

Technologies: Python (NLTK, NumPy, SciPy, scikit klearn, Matplotlib), MongoDB, XML, Naïve Bayes (Gaussian, Multinomial, Bernoulli)

Project: Sentiment Analysis for healthcare

Technologies: Python (NLTK, NumPy, SciPy, scikit klearn, Matplotlib), MongoDB, XML, Naïve Bayes (Gaussian, Multinomial, Bernoulli)

Project: Identify sections in job descriptions using Machine Learning

Technologies: Python (NLTK, NumPy, SciPy, scikit klearn, Matplotlib), MongoDB, XML, Naïve Bayes (Gaussian, Multinomial, Bernoulli)

Researcher + Developer + Project Leader, 4d-life, Barcelona, Spain, Nov 2014 – Jun 2015

Project: Natural Language Processing (Clustering) of company’ documents

Technologies: Python: NLTK, NumPy, SciPy, Scikit-klearn, Matplotlib, NetworkX, PyLucene, MySQL, Apache Lucene/Solr, k-means++, DBSCAN, hierarchical clustering, Dimensionality Reduction, Latent Semantic Analysis (LSA), Singular Value Decomposition (SVD), Gradient methods: gradient descent, steepest descent, conjugate gradient.

Researcher + Developer + Project Leader, BITYVIP Technology Ltd., Zaragoza, Spain, Aug. 2012 – Oct. 2014

Project: Content based Filtering Recommender system based on Natural Language Processing (Clustering) of documents from social networks and digital news.

Technologies: Python: NLTK, NumPy, SciPy, Scikit-klearn, Matplotlib, NetworkX, Panda, MySQL, MongoDB, k-means++, DBSCAN, hierarchical clustering

Project: Sentiment analysis of customer review data

Technologies: Python: NLTK, NumPy, SciPy, Scikit-klearn, Matplotlib, NetworkX, Panda, MySQL, MongoDB, HTML, PhP, CSS, Naïve Bayes (Gaussian, Multinomial, Bernoulli), Logistic regression, Perceptron, Ridge regression, Passive Aggressive, Support Vector Machine, SGD (Stochastic Gradient Descent), Nearest Centroid

Project: Estimation of the influence of news on opinions in social networks

Technologies: Python: NLTK, NumPy, SciPy, Panda, scikit klearn, matplotlib, MySQL, MongoDB, linear and nonlinear regression (one dimensional, multidimensional), ordinary least squares (OLS)

Project: ETL Extract Transform Load CSV files to MongoDB and MySQL

Technologies: Python: NLTK, NumPy, SciPy, Panda, MySQL, MongoDB

Project: Software application for airfare reservation using clustering

Technologies: Python: NLTK, NumPy, SciPy, Scikit-klearn, Matplotlib, NetworkX, Panda, MySQL, MongoDB, HTML, PhP, CSS, Self-Organizing Maps (SOM)


Apache, Artificial Intelligence, Data Science, Machine Learning, MongoDB, MySQL, Natural Language Processing, NumPy, Pandas, SciPy, Solr

Joined: June 2019