Juglar Diaz

Data mining, machine learning and natural language processing engineer.

Santiago, Chile

Summary

More than eight years of experience in research and applied projects of data mining, machine learning and natural language processing.

Languages:

English, Spanish

Favorite Python Packages:

pandas tensorflow pytorch sklearn scipy

Experience

I have been involved in data mining and machine learning projects related to natural language processing and text mining, mainly tasks of text classification and social media text mining. I have also worked with spatio-temporal data. I have experience and professional certifications from Google Cloud Platform.

  • In a general purpose Natural Language Processing system one of the first steps is to identify in which language is written a text. The solutions to this task most be fast and simple. Also in the case of social media data like tweets, the text is few words in any cases and people use informal language to write them which makes the problem difficult to solve. I worked in a python package for language identification of tweets using n-grams of characters and feature selection methods.
  • Correference resolution is about detecting with parts of a text make reference to the same entity. This is a difficult task in Natural Language Processing because it demands a deep understanding of language and how it relate to our representation of the knowledge of the world. I implemented a C++ module for correference resolution for spanish texts. I used an unsupervised approach and worked mainly with clustering algorithms based on graphs.
  • Spatio-temporal textual data is usually represented as a record in the form of a〈where, when, what〉tuple, where where means a location’s latitude-longitude coordinates, when is a timestamp and what is the content, usually in text form. I have worked with neural networks models for spatio-temporal textual data representation. I have used python libraries for data mining, machine learning, neural networks and text mining like pandas, tensorflow, pytorch and gensim.
  • I participated in a Google Cloud project to classify texts by topics and to do sentiment analysis. A mall collected opinions from their clients and they wanted to automate the process of detecting the sentiment of the opinions and which departments should process them. We used Google Cloud services like: Cloud Storage, Cloud Dataflow, AutoML and Big Query.
  • A news agency wanted to automate the process of classifying their articles in a hierarchy of topics. I built machine learning classification models for text classification using pre-trained embedding models and topic modeling techniques. We used python libraries for natural language processing, data mining, text mining and machine learning like: sklearn, nltk, pandas and gensim.
  • I participated in a project where we studied spatio-temporal patterns of crime incidents. I combined multiple data sources to build time-series and spatial prediction models. We used python libraries for natural language processing, data mining, text mining and machine learning like: sklearn, nltk, pandas, matplotlib and gensim.

Skills

Artificial Intelligence, Data Science, Google Cloud Platform (GCP), Keras, Machine Learning, Natural Language Processing, NumPy, Pandas, TensorFlow, Visualization

Joined: September 2019