検索 | InterSystems Developer Community

記事

Alex Woodhead · 2023年6月14日 2m read

#Artificial Intelligence (AI) #Large Language Model (LLM) #InterSystems IRIS

Posing a question to consider during the current Grand Prix competition.

I wanted to share an observation about using PDFs with LangChain.

When loading the text out of a PDF, I noticed there was an artifact of gaps within some of the words extracted.

For example (highlighted in red)

Adapti ve Analytics is an optional e xtension that pro vides a b usiness-oriented, virtual data model layer\nbetween InterSystems IRIS and popular Business Intelligence (BI) and Artificial Intelligence (AI) client tools. It includes\nan intuiti ve user interf ace for de veloping a data model in the form of virtual cubes where data can be or ganized, calculated\nmeasures consistently defined, and data fields clearly named. By ha ving a centralized common data model, enterprises\nsolve the problem of dif fering definitions and calculations to pro vide their end users with one consistent vie w of b usiness\nmetrics and data characterization.

It was concerning this would affect:
1) The quality of document search for related content
2) The ability of OpenAI model to generate answers

What might be needed to stitch these words back together to improve things?

Could this use a word dictionary?

What would be the risk of linking two seperate words together.

Pushing ahead the unanticipated outcome was:

It didn't make a difference to either the document search or the ability to generate answers.

I suspect this is down to the way that OpenAI encoding and tokenizing operate.
The number of tokens is always higher than the number of words.
So tokens are already like "partial" words where tokens follow one another.
Thus the spaces in the middle of words didn't affect the answer.

Please share your experiences of Ghosts / Curious effects when using LangChain with IRIS.

ディスカッション (0)2

続けるにはログインするか新規登録を行ってください

記事

Robert Cemper · 2023年6月13日 2m read

Open Exchange

#分析 #Embedded Python #ビデオ #Open Exchange

Technology Strategy

When I started this project I had set myself limits:
Though there is a wide range of almost ready-to-use modules in various languages
and though IRIS has excellent facilities and interfaces to make use of them
I decided to solve the challenge "totally internal" just with embedded Python, SQL, ObjectScript
Neither Java, nor Nodes, nor Angular, PEX, ... you name it.
The combination of embedded Python and SQL is preferred. ObjectScript is just my last chance.

I was especially impressed how easy reading an HTTPS page with Python was.
On the other hand, I left Unit Test and Global Merge and Object Property Setter in COS

Add on after 1st release

The fact that the initial load took about 50 min was rather shocking to have 730 records in the end.
So kind of a QUICK preload was added. In practical work only the first page and eventually during a contest
the 2nd page of the directory holds new entries. The rest is almost static, not to say frozen.

Loading a page 1 and 2 is mostly sufficient to get all new packages
Then loading DETAILS for the few newbies is not worth mentioning.

Collecting results with SQl is an easy exercise but pivoting a cube is a bit more comfortable
So I added today classic IRIS Analytics to my package.
It's enabled in Namespace USER and is named OEX similar to the first Pivot to start with

After starting the container the Unit Test leaves a test set of page 1 with ~30 records
Which is also the initial content of the Cube.

If you decide to run a completely fresh load it is up to you to rebuild the cube in Analytics Architect.

While using the QUICK variant the final step is a rebuild of the cube and you get this result.

So whether you intend to use SQL or Analytics is your decision.

I count on your votes in the contest

ディスカッション (0)1

続けるにはログインするか新規登録を行ってください

質問

diba · 2023年6月12日

#InterSystems IRIS

DB or Backend Automation on IRIS Dataplatform performed?

3 Comments

ディスカッション (3)1

続けるにはログインするか新規登録を行ってください

質問

diba · 2023年6月12日

#API #InterSystems IRIS

API performed API Automation on IRIS Dataplatform?

4 Comments

ディスカッション (4)1

続けるにはログインするか新規登録を行ってください

記事

Robert Cemper · 2023年6月12日 2m read

Open Exchange

#Embedded Python #SQL #ビデオ #Open Exchange

Scenario

You all know Open Exchange (OEX) and the is no need for a detailed explanation.

It consists of a directory with various filters and detail pages for packages.

This is great for manual navigation.
But the most interesting information for me is the content of the blue box on the right.
All content comes from a database somewhere in the background and is not accessible to me and you.

Navigating manually over more than 700 packages in the search of a particular entity is not funny.
So I decided to have my own table with my criteria of interest.

url // relative as in OEX directory; UNIQUE
label // package name
author
technology
zpmmodul
review // flag if review exists
page // page in OEX directory
stars // assigned in reviews
version
lastupdate //Date
IRIS // flag for IRIS
ZPM // flag for support of IPM/ZPM
xurl // full URL of package

To fill this table I decided to use only methods in embedded Python
all projected as SQL procedures.
So there is no need for terminal access. Except for SQL shell.

Data Loading Strategy

Scan OEX directory pages to collect Labels and URL and PageNumber
- This is an acceptable fast step to load and scan ~25 pages => ~730 records
Based on the URLs load and scan detail pages review pages.
- This results in loading and scanning of ~1500 pages
- which consumes quite some time depending on your network capacity,

And then you are free to navigate and query with SQL as you like.

Video

GitHub.

2 Comments

ディスカッション (2)3

続けるにはログインするか新規登録を行ってください

検索

LangChain Ghost in the PDF

OEX mapping #2

Technology Strategy

Add on after 1st release

Folks, did Anyone performed DB or Backend Automation on IRIS Dataplatform?

Performed API Automation on IRIS Dataplatform?

OEX mapping

Scenario