How to use pylazaro

In order to use pylazaro you need to create an instance of the class pylazaro.lazaro.Lazaro. This object will be our tagger. We can then pass any text in Spanish to the tagger through the method pylazaro.lazaro.Lazaro.analyze() and the tagger will return the lexical borrowings found in the text encoded in the object pylazaro.outputs.LazaroOutput.

Example

Here is a minimal example of how to use pylazaro:

>>> from pylazaro import Lazaro
>>> tagger = Lazaro()
>>> text = "Inteligencia artificial aplicada al sector del blockchain, la e-mobility y las smarts grids entre otros; favoreciendo las interacciones colaborativas."
>>> result = tagger.analyze(text)
>>> result.borrowings_to_tuple()
[('blockchain', 'en'), ('e-mobility', 'en'), ('smarts grids', 'en')]
>>> output.borrowings_to_dict()
[{'borrowing': 'blockchain', 'language': 'en', 'start_pos': 6, 'end_pos': 7}, {'borrowing': 'e-mobility', 'language': 'en', 'start_pos': 9, 'end_pos': 10}, {'borrowing': 'smarts grids', 'language': 'en', 'start_pos': 12, 'end_pos': 14}]
>>> result.tag_per_token()
[('Inteligencia', 'O'), ('artificial', 'O'), ('aplicada', 'O'), ('al', 'O'), ('sector', 'O'), ('del', 'O'), ('blockchain', 'B-ENG'), (',', 'O'), ('la', 'O'), ('e-mobility', 'B-ENG'), ('y', 'O'), ('las', 'O'), ('smarts', 'B-ENG'), ('grids', 'I-ENG'), ('entre', 'O'), ('otros', 'O'), (';', 'O'), ('favoreciendo', 'O'), ('las', 'O'), ('interacciones', 'O'), ('colaborativas', 'O'), ('.', 'O')]

Running pylazaro with other models

pylazaro can be run with five different types of models (see How does pylazaro work? in About pylazaro):

  1. A BiLSTM-CRF model fed with subword embeddings and lexical embeddings pretrained on codeswitching data (this is the best performing model, and the default model used by pylazaro)

  2. A BiLSTM-CRF model fed with subword embeddings and bilingual Transformer-based Spanish-English lexical embeddings

  3. A Transformer model based on multilingual BERT

  4. A Transformer model based on Spanish model BETO

  5. A Conditional Random Field model with handcrafted features

By default, pylazaro will use the first model (BiLSTM-CRF with codeswitched embeddings), which is the best-performing model of all, but this can be modified when instantiating pylazaro.lazaro.Lazaro:

>>> tagger_bilstm = Lazaro(model_type = 'bilstm', model_file="lirondos/anglicisms-spanish-flair-cs") # Equivalent to tagger_bilstm = Lazaro() and to tagger_bilstm = Lazaro(model_type = 'bilstm')
>>> tagger_bilstm = Lazaro(model_type = 'bilstm', model_file="lirondos/anglicisms-spanish-flair-bert-beto")
>>> tagger_bilstm = Lazaro(model_type = 'transformers', model_file="lirondos/anglicisms-spanish-mbert") # Equivalent to tagger_transformers = Lazaro(model_type = 'transformers')
>>> tagger_bilstm = Lazaro(model_type = 'transformers', model_file="lirondos/anglicisms-spanish-beto")
>>> tagger_crf = Lazaro(model_type = 'crf') # Requires extended installation

Warning

In order to run the CRF model, the extended installation is required (see Installation guide). However, we don’t recommend using the CRF model, as it is the worst-performing model of all three options (and the extended installation will significantly take more memory space).