検索

記事
· 2024年7月18日 8m read

Guía paso a paso para crear un chatbot personalizado utilizando spaCy (librería Python NLP)

image

Hola Comunidad,

En este artículo, os demostraré los siguientes pasos para crear vuestro propio chatbot utilizando spaCy (spaCy es una biblioteca de software de código abierto para el procesamiento avanzado del lenguaje natural, escrita en los lenguajes de programación Python y Cython):

  • Paso 1: Instalar las librerías necesarias

  • Paso2: Crear el archivo de patrones y respuestas

  • Paso 3: Entrenar el modelo

  • Paso 4: Crear una aplicación ChatBot basada en el modelo entrenado

Empecemos

Paso 1: Instalar las librerías necesarias

En primer lugar, necesitamos instalar las librerías python necesarias, lo que podemos conseguir ejecutando el siguiente comando:

pip3 install spacy nltk
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

Paso2: Crear el archivo de patrones y respuestas

Necesitamos crear un archivo intents.json que contenga patrones de preguntas y respuestas, a continuación se muestra un ejemplo de algunos de los patrones de preguntas y respuestas

{
  "intents": [
    {
      "tag": "greeting",
      "patterns": [
        "Hi",
        "Hey",
        "How are you",
        "Is anyone there?",
        "Hello",
        "Good day"
      ],
      "responses": [
        "Hey :-)",
        "Hello, thanks for visiting",
        "Hi there, what can I do for you?",
        "Hi there, how can I help?"
      ]
    },
    {
      "tag": "goodbye",
      "patterns": ["Bye", "See you later", "Goodbye"],
      "responses": [
        "See you later, thanks for visiting",
        "Have a nice day",
        "Bye! Come back again soon."
      ]
    },
    {
      "tag": "thanks",
      "patterns": ["Thanks", "Thank you", "That's helpful", "Thank's a lot!"],
      "responses": ["Happy to help!", "Any time!", "My pleasure"]
    },
    {
      "tag": "items",
      "patterns": [
        "tell me about this app",
        "What kinds of technology used?",
        "What do you have?"
      ],
      "responses": [
        "Write something about the app."
      ]
    },        
    {
      "tag": "funny",
      "patterns": [
        "Tell me a joke!",
        "Tell me something funny!",
        "Do you know a joke?"
      ],
      "responses": [
        "Why did the hipster burn his mouth? He drank the coffee before it was cool.",
        "What did the buffalo say when his son left for college? Bison."
      ]
    }
  ]
}

Paso 3: Entrenar el modelo

3.1- Crear un fichero NeuralNet model.py que se utilizará para entrenar el modelo

#model.py file
import torch.nn as nn

class NeuralNet(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(NeuralNet, self).__init__()
        self.l1 = nn.Linear(input_size, hidden_size) 
        self.l2 = nn.Linear(hidden_size, hidden_size) 
        self.l3 = nn.Linear(hidden_size, num_classes)
        self.relu = nn.ReLU()
    
    def forward(self, x):
        out = self.l1(x)
        out = self.relu(out)
        out = self.l2(out)
        out = self.relu(out)
        out = self.l3(out)
        # no activation and no softmax at the end
        return out

3.2 - Necesitamos el archivo nltk_utils.py que utiliza la librería python «nltk» (una plataforma utilizada para construir programas Python que trabajan con datos del lenguaje humano para aplicarlos en el procesamiento estadístico del lenguaje natural (PLN)). 

#nltk_utils.py
#nltk library used for building Python programs that work with human language data for applying in 
#statistical natural language processing (NLP)
import numpy as np
import nltk
from nltk.stem.porter import PorterStemmer
stemmer = PorterStemmer()

def tokenize(sentence):
    """
    split sentence into array of words/tokens
    a token can be a word or punctuation character, or number
    """
    return nltk.word_tokenize(sentence)


def stem(word):
    """
    stemming = find the root form of the word
    examples:
    words = ["organize", "organizes", "organizing"]
    words = [stem(w) for w in words]
    -> ["organ", "organ", "organ"]
    """
    return stemmer.stem(word.lower())


def bag_of_words(tokenized_sentence, words):
    """
    return bag of words array:
    1 for each known word that exists in the sentence, 0 otherwise
    example:
    sentence = ["hello", "how", "are", "you"]
    words = ["hi", "hello", "I", "you", "bye", "thank", "cool"]
    bog   = [  0 ,    1 ,    0 ,   1 ,    0 ,    0 ,      0]
    """
    # stem each word
    sentence_words = [stem(word) for word in tokenized_sentence]
    # initialize bag with 0 for each word
    bag = np.zeros(len(words), dtype=np.float32)
    for idx, w in enumerate(words):
        if w in sentence_words: 
            bag[idx] = 1
    return bag

3.3 - Tenemos que entrenar el modelo, vamos a utilizar el archivo train.py para crear nuestro modelo

#train.py
import numpy as np
import json,torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
from nltk_utils import bag_of_words, tokenize, stem
from model import NeuralNet
 
with open('intents.json', 'r') as f:
    intents = json.load(f)

all_words = []
tags = []
xy = []
# loop through each sentence in our intents patterns
for intent in intents['intents']:
    tag = intent['tag']
    # add to tag list
    tags.append(tag)
    for pattern in intent['patterns']:
        # tokenize each word in the sentence
        w = tokenize(pattern)
        # add to our words list
        all_words.extend(w)
        # add to xy pair
        xy.append((w, tag))

# stem and lower each word
ignore_words = ['?', '.', '!']
all_words = [stem(w) for w in all_words if w not in ignore_words]
# remove duplicates and sort
all_words = sorted(set(all_words))
tags = sorted(set(tags))

print(len(xy), "patterns")
print(len(tags), "tags:", tags)
print(len(all_words), "unique stemmed words:", all_words)

# create training data
X_train = []
y_train = []
for (pattern_sentence, tag) in xy:
    # X: bag of words for each pattern_sentence
    bag = bag_of_words(pattern_sentence, all_words)
    X_train.append(bag)
    # y: PyTorch CrossEntropyLoss needs only class labels, not one-hot
    label = tags.index(tag)
    y_train.append(label)

X_train = np.array(X_train)
y_train = np.array(y_train)

# Hyper-parameters 
num_epochs = 1000
batch_size = 8
learning_rate = 0.001
input_size = len(X_train[0])
hidden_size = 8
output_size = len(tags)
print(input_size, output_size)

class ChatDataset(Dataset):
    def __init__(self):
        self.n_samples = len(X_train)
        self.x_data = X_train
        self.y_data = y_train

    # support indexing such that dataset[i] can be used to get i-th sample
    def __getitem__(self, index):
        return self.x_data[index], self.y_data[index]

    # we can call len(dataset) to return the size
    def __len__(self):
        return self.n_samples

dataset = ChatDataset()
train_loader = DataLoader(dataset=dataset,
                          batch_size=batch_size,
                          shuffle=True,
                          num_workers=0)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = NeuralNet(input_size, hidden_size, output_size).to(device)
# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
# Train the model
for epoch in range(num_epochs):
    for (words, labels) in train_loader:
        words = words.to(device)
        labels = labels.to(dtype=torch.long).to(device)        
        # Forward pass
        outputs = model(words)
        # if y would be one-hot, we must apply
        # labels = torch.max(labels, 1)[1]
        loss = criterion(outputs, labels)        
        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()        
    if (epoch+1) % 100 == 0:
        print (f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

print(f'final loss: {loss.item():.4f}')
data = {
"model_state": model.state_dict(),
"input_size": input_size,
"hidden_size": hidden_size,
"output_size": output_size,
"all_words": all_words,
"tags": tags
}
FILE = "chatdata.pth"
torch.save(data, FILE)
print(f'training complete. file saved to {FILE}')

3.4 - Ejecute el archivo train.py anterior para entrenar el modelo

El archivo train.py creará el archivo chatdata.pth que se utilizará en nuestro archivo chat.py

Paso 4: Crear una aplicación ChatBot basada en el modelo entrenado

Como Nuestro modelo es creado. En el paso final, vamos a crear un archivo chat.py que podemos utilizar en nuestro chatbot.

#chat.py
import os,random,json,torch
from model import NeuralNet
from nltk_utils import bag_of_words, tokenize

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

with open("intents.json") as json_data:
    intents = json.load(json_data)

data_dir = os.path.join(os.path.dirname(__file__))
FILE = os.path.join(data_dir, 'chatdata.pth')
data = torch.load(FILE)

input_size = data["input_size"]
hidden_size = data["hidden_size"]
output_size = data["output_size"]
all_words = data['all_words']
tags = data['tags']
model_state = data["model_state"]

model = NeuralNet(input_size, hidden_size, output_size).to(device)
model.load_state_dict(model_state)
model.eval()

bot_name = "iris-NLP"
def get_response(msg):
    sentence = tokenize(msg)
    X = bag_of_words(sentence, all_words)
    X = X.reshape(1, X.shape[0])
    X = torch.from_numpy(X).to(device)

    output = model(X)
    _, predicted = torch.max(output, dim=1)

    tag = tags[predicted.item()]

    probs = torch.softmax(output, dim=1)
    prob = probs[0][predicted.item()]
    if prob.item() > 0.75:
        for intent in intents['intents']:
            if tag == intent["tag"]:
                return random.choice(intent['responses'])
    return "I do not understand..."
if __name__ == "__main__":
    print("Let's chat! (type 'quit' to exit)")
    while True:
        sentence = input("You: ")
        if sentence == "quit":
            break
        resp = get_response(sentence)
        print(resp)

Ejecutad el archivo chat.py y realizad algunas preguntas que ya definimos anteriormente en nuestro archivo intents.json

 

Para más información, visitad la página de solicitud de intercambio abierto IRIS-GenLab

Gracias

ディスカッション (0)1
続けるにはログインするか新規登録を行ってください
質問
· 2024年7月18日

Finding all usages of deprecated classes and methods

Hi folks,

We are in the process of migrating from Ensemble to IRIS.

After the migration I would like to find all usages of deprecated classes and methods in all of our code. This is to align with the changes in package like %SYSTEM.SQL for example.

Is there an easy way to do this, or maybe a tool that can assist?

Thanks.

7 Comments
ディスカッション (7)3
続けるにはログインするか新規登録を行ってください
記事
· 2024年7月17日 2m read

Gracefully Shutting Down IRIS Without Terminal Access: *nix Flavor

I found myself in the not-so-comfortable situation of working with a Linux system on which someone had accidentally disabled user access to the Linux shell. HealthConnect was running, servicing hundreds of interfaces. To resolve the access issue, though, we needed to bring the host down for the application of a fix.

Without the shell, the iris command is not available to control the instance, so we were faced with the potential of shutting down the server ungracefully. We wanted to avoid that if possible ...

The ^SHUTDOWN routine was historically an option for shutting down Cache, but you need a terminal session to execute it (We'll be talking more about what qualifies as a terminal session in a minute). But ^SHUTDOWN is now deprecated, and when you execute it, you get the message "Please use the 'iris stop' procedure to shut down the system."

So cross that off the list ... and replace it with INTNOSHUT^SHUTDOWN. Yes, running this command will gracefully halt IRIS. And yes, you need an IRIS command shell to execute it. So where do you get an IRIS command shell for the system you're locked out of, you ask?

In the not-long-for-this-world IRIS Studio, of course! The Output window allows you to execute IRIS commands, and this won't be a surprise to many. It will certainly allow you to run D INTNOSHUT^SHUTDOWN in the output window (after switching to the %SYS namespace). However, if you do exactly that IRIS will most likely start to shut down and then hang, since Studio keeps a session open. It may never completely shut down, and you would have no way to force it down other than to halt the operating system.

That said, you can achieve a full shutdown by using the command JOB INTNOSHUT^SHUTDOWN, then immediately exiting Studio. IRIS will (more likely than not) shutdown gracefully and you can feel better about doing things the "right" way ... even if it feels wrong.

As far as regaining user access to the Linux shell is concerned, that's a topic for another forum. But now that IRIS is safely shut down, the issue with access can be resolved (some disassembly probably required).

7 Comments
ディスカッション (7)6
続けるにはログインするか新規登録を行ってください
質問
· 2024年7月17日

IRIS通过JDBC连接第三方Oracle数据库如何设置字符集

IRIS通过JDBC连接第三方数据库汉字查询乱码,第三方反馈需要设置字符集,不知字符集该如何设置

ディスカッション (0)0
続けるにはログインするか新規登録を行ってください
InterSystems公式
· 2024年7月17日

Somente MacOS - Fim da manutenção para Cache e Ensemble no MacOS

A partir de 15 de outubro de 2024, o suporte para Caché e Ensemble no MacOS será Descontinuado.

Caché & Ensemble 2018.1.9 continuará a ter suporte, no entanto, não haverá mais lançamentos de manutenção para MacOS. Isso significa que Caché & Ensemble 2018.1.9 será a versão final desses produtos no MacOS.

Como lembrete, as versões de manutenção do Caché e do Ensemble nas outras plataformas suportadas terminarão em 31 de março de 2027.
Mais detalhes sobre isso podem ser encontrados no anúncio do ano passado.

ディスカッション (0)0
続けるにはログインするか新規登録を行ってください