Encontrar

記事
· 2025年3月1日 7m read

IRIS Vector Search for Matching Companies and Climate Action

Hey, community! 👋

We are a team of Stanford students applying technology to make sense of climate action. AI excites us because we know we can quickly analyze huge amounts of text.

As we require more reports on sustainability, such as responsibility reports and financial statements, it can be challenging to cut through the noise of aspirations and get to the real action: what are companies doing

That’s why we built a tool to match companies with climate actions scraped from company sustainability reports.

In this post, we’ll show you how to implement this tool as a chatbot interfacing with an InterSystems IRIS vector database.

  • Step 1: Setup
  • Step 2: Create database table
  • Step 3: Embed content and populate database
  • Step 4: Use vector search with user input
  • Step 5: Find climate actions

Step 1: Setup

Make sure to download Docker and then follow the InterSystems community hackathon start guide. The guide walks you through creating a new database container and connecting to the locally hosted web database portal. Navigate to http://localhost:52773/csp/sys/UtilHome.csp and use the credentials username: demo, password: demo to login.

Step 2: Create database table

Now, for the fun part!

First, we use Python to create the database to store our text chunks. We use the following script:

import pandas as pd
import numpy as np
import iris
import time
import os

### Vector Database Setup
username = "demo"
password = "demo"
hostname = os.getenv("IRIS_HOSTNAME", "localhost")
port = "1972"
namespace = "USER"
CONNECTION_STRING = f"{hostname}:{port}/{namespace}"
print(CONNECTION_STRING)

### Connect to IRIS
conn = iris.connect(CONNECTION_STRING, username, password)
cursor = conn.cursor()

### Create new table. Replace BASLABS to your prefix.
embedTableName = "BASLABS.ClimateReportsEmbed"
tableDefinition = """
( 
source_url VARCHAR(1000),
page INT,
total_pages INT,
content VARCHAR(5000), 
content_embedding VECTOR(DOUBLE, 384)
)
"""

### Re-create if table already exists
if CREATE_TABLE:
    try:
        cursor.execute(f"DROP TABLE {embedTableName}")
    except:
        pass
    cursor.execute(f"CREATE TABLE {embedTableName} {tableDefinition}")
  • We use iris.connect to create an SQL conn instance in Python and connect to the database. We can then use the cursor object to perform create/read/update/delete operations within the database.
  • We define the table structure as a string in the variable tableDefinition. Notably, we include a content_embedding field of type VECTOR(DOUBLE, 384). You must make sure that the number matches the dimension of your text embeddings. In this project, we are using 384 to match the “all-MiniLM-L6-v2” embedding model from the SentenceTransformers package.
  • Finally, we use cursor.execute to create the table.

Step 3: Embed content and populate database

Next, we will populate the database.

To do so, create a file with a list of PDF links you want to scrape. You can also use our list of links that we used during the Stanford TreeHacks hackathon. Your file should look something like this:

<http://website1.xyz/pdf-content-upload1.pdf>
<http://website2.xyz/sustainability-report-2024.pdf>
...

Then, we use LangChain PyPDFLoader(url) to load and split the PDF content.

import pandas as pd
from langchain.docstore.document import Document
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_iris import IRISVector
from langchain_community.document_loaders import PyPDFLoader

# Database setup as above...

# Embedding model
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")

# Setup Text Splitter
text_splitter = RecursiveCharacterTextSplitter(
    separators=["\\n\\n", "\\n", ". ", "? ", "! "],
    chunk_size=1000,
    chunk_overlap=100,
    length_function=len,
    is_separator_regex=False,
)

# Utility functions
def get_df(docs, pdf_url):
    data = [
        {
            "page": doc.metadata["page"],
            "total_pages": doc.metadata["total_pages"],
            "content": doc.page_content,
        }
        for doc in docs
    ]
    doc_df = pd.DataFrame(data)
    doc_df["source_url"] = pdf_url
    doc_embeddings = model.encode(doc_df["content"], normalize_embeddings=True).tolist()
    doc_df["content_embedding"] = doc_embeddings
    return doc_df
    
def insert_df(doc_df):
    sql = f"""
    INSERT INTO {embedTableName}
    (source_url, page, total_pages, content, content_embedding) 
    VALUES (?, ?, ?, ?, ?, ?, TO_VECTOR(?))
    """
    data = [
        (
            row["source_url"],
            row["page"],
            row["total_pages"],
            row["content"],
            str(row["content_embedding"]),
        )
        for index, row in doc_df.iterrows()
    ]
    results = cursor.executemany(sql, data)

# Main loop
with open("pdf_links.txt") as f:
        links = f.read().split("\\n")
 
for url in links:   
	loader = PyPDFLoader(url)
	documents = loader.load()
	docs = text_splitter.split_documents(documents)
	print(f"Found {len(docs)} docs for url: {url}")
	df = get_df(docs, url)
	insert_df(df)
	
  • First, we set up an embedding model, which we import from sentence_transformers.
  • Next, we define how to split the text of the reports. We use LangChains recursive text splitter that tries to split on provided characters in order until the chunks are small enough.
  • We then create two utility functions: get_df and insert_df. We utilize pandas dataframes to package text content and its metadata for later storage effectively.
  • Finally, we loop over all links in the text file, loading the PDF using LangChain, splitting into smaller chunks, embedding their contents, and inserting them into the IRIS Vector Database.

Step 4: Use vector search with user input

Finally, we will create a chatbot wrapper to perform a vector search based on a company climate description. First, we package the database functionality into an IRIS_Database class:

import iris
import time
import os
import pandas as pd
from sqlalchemy import create_engine

from sentence_transformers import SentenceTransformer

class IRIS_Database:
    def __init__(self):
        ### Vector Database Setup
        username = "demo"
        password = "demo"
        hostname = os.getenv("IRIS_HOSTNAME", "localhost")
        port = "1972"
        namespace = "USER"
        CONNECTION_STRING = f"{hostname}:{port}/{namespace}"

        self.conn = iris.connect(CONNECTION_STRING, username, password)
        self.cursor = self.conn.cursor()

        # Tables
        self.report_table = "BASLABS.ClimateReportsEmbed"

        # Embeddings
        self.embed = SentenceTransformer("all-MiniLM-L6-v2")

    def get_report_section(self, search_query, n=5):
        if not self.conn:
            self.connect()
				# Embed query using the same embedding model as before
        search_vector = self.embed.encode(
            search_query, normalize_embeddings=True
        ).tolist()
        # SQL query to perform vector search
        sql = f"""
            SELECT TOP ? source_url, page, content
            FROM {self.report_table}
            ORDER BY VECTOR_DOT_PRODUCT(content_embedding, TO_VECTOR(?)) DESC
        """
        self.cursor.execute(sql, [n, str(search_vector)])
        # Format results
        results = self.cursor.fetchall()
        return [
            dict(
                source_url=row[0],
                page=str(row[1]),
                content=row[2],
            )
            for row in results
        ]
  • First, we set up the database as before in the __init__ method. Notice that we also load the embedding model to embed our search queries later on.
  • The get_report_section does all of the heavy lifting. First, it embeds the query vector using the same model used when populating the database. Then, we use an SQL query to perform the vector search. The magic lies in ORDER BY VECTOR_DOT_PRODUCT(content_embedding)since it is here, we find the cosine similarities between text embeddings and return the most similar chunks.
  • Finally, we format the results as a Python dictionary and return to the user.

Step 5: Find climate actions

As a final step, we can create a front-facing client application called vector search. Below, we include a basic chat interface without a language model. But feel free to extend it to your needs! See our DevPost for more details on setting up an agentic tool calling workflow using DAIN!

import os
import sys
import json
from IRIS_Database import IRIS_Database

def get_report_section(search_query: str) -> list[dict]:
    """
    Perform RAG search for report data.
    """
    db = BAS_Database()
    result = db.get_report_section(search_query)
    return result
    
# ask user for
while True:
	print("Please tell me a bit more about your company and your goals. (q to quit)")
	user_input = input("> ")
	if user_input.lower().startswith("q"):
		break
		
	results = get_report_section(user_input)
	print(f"Found {len(results)} matches!")
	for r in results:
		print(r)

Wrapping up

And that’s it! We hope this can inspire you to use an embedding search to match individuals, companies, and even nations with more ambitious climate actions. Thank you to InterSystems for providing these services.

If you have any questions, please leave them in the comments below.

Happy InterSystems Hacking!

Authors: Alice, Suze, and Bubble

ディスカッション (0)0
続けるにはログインするか新規登録を行ってください
質問
· 2025年3月1日

How to use upload manager to upload data to codetable tables.

I am trying to upload using Upload manager functionality in Trakcare , but unable to do so without any guide or documentation.

If anyone can assist me through that would be greatly appreciated.

Thanks.

1 Comment
ディスカッション (1)1
続けるにはログインするか新規登録を行ってください
記事
· 2025年3月1日 6m read

Leveraging InterSystems IRIS for Health Data Analytics with Explainable AI and Vector Search

Introduction

To achieve optimized AI performance, robust explainability, adaptability, and efficiency in healthcare solutions, InterSystems IRIS serves as the core foundation for a project within the x-rAI multi-agentic framework. This article provides an in-depth look at how InterSystems IRIS empowers the development of a real-time health data analytics platform, enabling advanced analytics and actionable insights. The solution leverages the strengths of InterSystems IRIS, including dynamic SQL, native vector search capabilities, distributed caching (ECP), and FHIR interoperability. This innovative approach directly aligns with the contest themes of "Using Dynamic SQL & Embedded SQL," "GenAI, Vector Search," and "FHIR, EHR," showcasing a practical application of InterSystems IRIS in a critical healthcare context.

1 Comment
ディスカッション (1)1
続けるにはログインするか新規登録を行ってください
お知らせ
· 2025年3月1日

Developer Community Recap, February 2025

Hello and welcome to the February 2025 Developer Community Recap.
General Stats
150 new posts published in February:
 44 new articles
 31 new announcements
 73 new questions
 2 new discussions
377 new members joined in February
14,572 posts published all time
15,378 members joined all time
Top posts
Top authors of the month
Articles
#InterSystems IRIS
Correctly manage the system time of my IRIS container
By Sylvain Guilbaud
Run Your InterSystems Solution In Your Kubernetes Environment With Guaranteed Quality of Service
By Ariel Glikman
SQL Host Variables missing ?
By Robert Cemper
Monitoring InterSystems IRIS with Prometheus and Grafana
By Stav Bendarsky
REST Service in IRIS Production: When You Crave More Control Over Raw Data
By Aleksandr Kolesov
Generation of OpenAPI Specifications
By Alessandra Carena
Using DocDB in SQL, almost
By Dmitry Maslennikov
Streamlining Interoperability with Embedded Python in InterSystems IRIS
By Anila Kosaraju
IntegratedML Configuration and Application in InterSystems IRIS
By André Dienes Friedrich
IRIS %Status and Exceptions
By Ashok Kumar
IRIS %Status and Exceptions Part-2
By Ashok Kumar
Orchestrating a local LLM with IRIS Interoperability
By Chris Stewart
Using SQL Gateway with Python, Vector Search, and Interoperability in InterSystems Iris - Part 1 - SQL Gateway
By Julio Esquerdo
Using SQL Gateway with Python, Vector Search, and Interoperability in InterSystems Iris - Part 2 – Python and Vector Search
By Julio Esquerdo
Using SQL Gateway with Python, Vector Search, and Interoperability in InterSystems Iris - Part 3 – REST and Interoperability
By Julio Esquerdo
Getting Started Using Istio Service Mesh with Mirrored IRIS Environment in Kubernetes
By Ariel Glikman
A look at Dynamic SQL and Embedded SQL
By Andre Larsen Barbosa
Using REST API, Flask and IAM with InterSystems IRIS - Part 1 - REST API
By Julio Esquerdo
Using REST API, Flask and IAM with InterSystems IRIS - Part 2 – Flask App
By Julio Esquerdo
Using REST API, Flask and IAM with InterSystems IRIS - Part 3 – IAM
By Julio Esquerdo
Having trouble with your IRIS instance after installing it on Linux? Filecheck to the rescue!
By Luis Angel Pérez Ramos
Back to Basics: Why InterSystems?
By Daniel Cole
HTTP and HTTPS with REST API
By Julio Esquerdo
JWT Creation and Integration in InterSystems IRIS
By Ashok Kumar
Importing json object with a large string property
By Theo Stolker
OMOP Odyssey - Celebration (House of Hades)
By Ron Sweeney
REST API with Swagger in InterSystems IRIS
By Julio Esquerdo
Securing HL7 Interfaces with SSL/TLS (X.509) Certificates
By Eric Fortenberry
Keeping IRISTEMP Under Control: Stop It from Taking Over Your Storage!
By Mario Sanchez Macias
Modern and Easy-to-Use VSCode Plugin for InterSystems ObjectScript: Class Diagram Visualization with PlantUML
By Jinyao
Introducing UDP Adapter
By Corentin Blondeau
How Do You Display %Status Error in Terminal Or Let's Have a New Helpful Terminal Alias!
By Evgeny Shvarov
The Case for IRIS and JavaScript
By Rob Tweed
 
#InterSystems IRIS for Health
 
#Other
 
#Developer Community Official
 
#Health Connect
 
Announcements
#InterSystems IRIS
[Webinar] Configuring a Web Server for use with InterSystems IRIS
By Anastasia Dyubaylo
What's new with InterSystems Language Server 2.7
By Raj Singh
[Video] Make Upgrades Easier - Introducing the Upgrade Impact Checklist
By Anastasia Dyubaylo
[Video] Globals in Embedded Python for Dummies
By Anastasia Dyubaylo
[Video] FHIR Repository and FHIR SQL Builder
By Anastasia Dyubaylo
[Hebrew Webinar]: GenAI + RAG - Leveraging Intersystems IRIS as your Vector DB
By Ronnie Hershkovitz
Alert: SQL Queries Returning Wrong Results
By Daniel Palevski
Beta Testers Needed for our Upcoming InterSystems IRIS Developer Professional Certification Exam
By Celeste Canzano
[Video] FHIR Performance Benchmarking
By Anastasia Dyubaylo
Reminder: Seeking Exam Design Feedback for InterSystems ObjectScript Specialist Certification Exam
By Celeste Canzano
[Video] Coding InterSystems ObjectScript with Copilot
By Anastasia Dyubaylo
[Video] Unit Testing with IPM
By Anastasia Dyubaylo
InterSystems AI Programming Contest: Vector Search, GenAI and AI Agents
By Anastasia Dyubaylo
 
#Global Summit
 
#Developer Community Official
 
#Open Exchange
 
#InterSystems IRIS for Health
 
#Job Opportunity
 
#Other
 
#InterSystems Ideas Portal
 
#InterSystems Official
 
#Learning Portal
 
Questions
#InterSystems IRIS
How to use %ValidateObject with %Dynamic properties?
By Gabriel Santos
keep only the COS routine object (cache object script) to preserve the source routine.mac
By Robert Cemper
Connecting IRIS and VS Code on Windows 11 PRO
By Tulio Dantas
To configure VSCode as Studio for Intersystems
By Manoj R
Base64 decoding and storing it in a local path
By Vachan Rannore
Py fails in Win Terminal - fixed
By Robert Cemper
trying to put the SYSMGR global into a readable string
By JIM WHEELER
How to find all globals in routines?
By Anna Golitsyna
Error encountered while calling an API
By Anthony Decorte
How to check if the system user with the name "John" already exists in IRIS via SQL?
By Evgeny Shvarov
Error - SFTP connection with ssh key on a server that excludes the rsa-ssh algorithm
By Moussa SAMB
Track the global set location
By Ashok Kumar
Sorting, remove duplicate and count of Json file
By Sandeep
I don't know what I did wrong.
By Chiho Han
Visual Studio Code eating licenses
By Fabio Care
Where is InterSystems IRIS Download for Windows?
By Irina Yaroshevskaya
CRLF to LF
By Dmitrij Vladimirov
How to force stop a Live job
By Thembelani Mlalazi
ssldefs.ini file error [Matching SSL server config not found]
By Chiho Han
Change %Response.ContentLength
By omer
What causes a broker RESPONSE to have Transfer-Encoding:chunked?
By omer
Can the WebGateway Port changed from i.e. 57773 to 433?
By Ditmar Tybussek
How to append to a list in a for loop using ObjectScript
By Preedhi Garg
How to log index errors as warning
By Tommy Heyding
Disable Shared Memory in JDBC Driver URL
By Matheus Gomide
How to get IPM module name from source directory
By Pravin Barton
Cube Compilation Order in UCR: How to Manage Dependencies?
By David.Satorres6134
Apache server .../temp/ does not exist, limiting Gateway functionality
By Marcel den Ouden
Need to query the rule data from the tables in the InterSystems database
By Rick Rowbotham
How To Test REST API Methods without referring to a Web Server?
By Evgeny Shvarov
Problem sending Mail via smtp-mail.outlook.com / OAUTH2 as SMTP or JSON
By Ditmar Tybussek
Reading HTTP multipart/form messages from EnsLib.HTTP.GenericMessage object.
By Federico Sandrinelli
Management portal not working
By john smith
How to Get the Adapter State and Status of a Business Operation
By Bransen Smith
Open IRIS studio and management portal in Linux
By john smith
How to set XMLIGNORENULL for non %String property?
By Joe Fu
Can I find out which user created an Operation in Iris?
By Aaron Vail
file path in FTP outbound adapter
By Krishnaveni Kapu
how to convert %Stream.FileBinaryGzip into %Stream.Object?
By Dmitrij Vladimirov
Net Gateway How to instantiate a Class with a Constructor with a parameter
By Thembelani Mlalazi
How to create new special parameter?
By Gabriel Santos
 
#InterSystems IRIS for Health
Load ObjectScript external file and run code
By Phillip Wu
How to use DTL remove property for a repeating segment in for each loop for target using copy
By Mark OReilly
Process multiple messages by increasing pool size while maintaining order with keys
By Thomas Haig
WSGI/Flask app with iris container complains about flask not installed
By John McBride
Need Help to do a connection Between IRIS and SAP RFC
By Mariam Chaaban
Handling FHIR Responses in InterSystems – Transforming to SDA
By Ali Chaib
Transforming a FHIR Response to SDA – Handling Task Resource
By Ali Chaib
Flask, Iris Container and NGINX?
By John McBride
Objectscript get classs property error
By Phillip Wu
VS CODE Search Options
By RKumar
Intersystems Reports (Logi Reports) - Does it merge multiple PDFs?
By Suzy Green
SQL SYS.Database table not found
By Phillip Wu
Error message “Invalid CSRF token”
By Edmara Francisco
Converting Documents to PDF in IRIS
By Thembelani Mlalazi
Program read answers from a OS file
By Phillip Wu
Searching InterSystems FHIR Server for specific Encounter (based on peroid)
By Sebastian Thiele
 
#Ensemble
 
#Caché
 
#Other
 
#Health Connect
 
#HealthShare
 
#TrakCare
Trakcare parameter to Jreport
By Jude Mukkadayil
 
Discussions
#InterSystems IRIS
 
February, 2025Month at a GlanceInterSystems Developer Community
ディスカッション (0)1
続けるにはログインするか新規登録を行ってください
質問
· 2025年3月1日

How can find assistance for global master Rewards

I redeemed a reward from global master's community but I am unable to track or got an update.

I require assistance with that.

2 Comments
ディスカッション (2)1
続けるにはログインするか新規登録を行ってください