Home
From Spreadsheet Nightmares to Data Dreams: How (Un)Perplexed Spready Rewrites the Rules
- Details
- Written by Super User
- Category: (Un)Perplexed Spready
From Spreadsheet Nightmares to Data Dreams: How (Un)Perplexed Spready Rewrites the Rules
It’s 3 p.m. on a Thursday, and you’re staring at a spreadsheet that feels like a puzzle with no solution. Rows of jumbled text – product descriptions, prices, random units – mock you from the screen. You’ve got a deadline looming, but instead of strategizing, you’re neck-deep in VLOOKUPs, manually copying data, and praying your formulas don’t break. Sound familiar? At Matasoft.hr, we’ve been there too – and we decided it’s time for a change. Enter (Un)Perplexed Spready, the AI-powered spreadsheet software that turns data nightmares into dreams of efficiency and insight.
This isn’t just a tool – it’s a revolution. Built with cutting-edge AI language models (LLM), (Un)Perplexed Spready doesn’t ask you to wrestle with your data; it steps in to do the hard work for you. From chaos to clarity, from frustration to focus, it’s here to rewrite the story of how we manage data – and hand you the pen.
The Old Way: A Tale of Time Lost
Picture this: a small business owner, Ana, runs a growing online store. She gets a CSV dump from a supplier – 500 rows of products, each a tangle of text like “Super Gadget, 2.3 kg, $59, matte black.” She needs sizes, prices, and categories sorted by tomorrow’s meeting. In Excel, it’s a marathon – hours of splitting cells, writing formulas, and double-checking errors. By the time she’s done, the day’s gone, and she’s too drained to plan her next move.
That’s the old way. It’s a story repeated daily across offices, homes, and industries – a quiet crisis of wasted potential. Traditional spreadsheets are like loyal but outdated friends: they try their best, but they’re not equipped for the messiness of modern data. At Matasoft, we knew there had to be a better chapter.
The New Way: AI as Your Co-Author
Now imagine Ana with (Un)Perplexed Spready. She opens the same file, types =ASK_LOCAL1(A2, "Extract the weight and convert it to pounds") and gets “5.07 lbs” instantly. She follows with =PERPLEXITY1(A2, "Is this electronics, home goods, or apparel?") and sees “electronics” pop up like magic. Curious about trends, she adds =ASK_LOCAL2(A2, B2, "What’s the common selling point here?") and discovers both items tout “durability.” In 15 minutes, her data’s organized, categorized, and ready to shine – leaving her free to strategize instead of slog.
That’s the new way. (Un)Perplexed Spready doesn’t just tweak the edges of spreadsheet life – it reimagines the core, embedding AI that understands your intent and delivers results with human-like smarts. Here’s how it rewrites the rules:
• Data Extraction Without the Drama
No more dissecting text by hand. A command like =ASK_LOCAL1(A2, "Pull the price and convert it to EUR") turns a tangled cell into clean, usable info – fast and flawless.
• Categorization That Clicks
Sorting feels effortless with =PERPLEXITY1(A2, "Does this fit under food, drink, or snacks?"). AI grasps context, not just keywords, making sense of the toughest datasets.
• Insights That Inspire
Dig deeper with =ASK_LOCAL2(A2, B2, "Are these items aimed at the same audience?") or =ASK_LOCAL2(A2, B2, "Which is more eco-friendly?"). It’s like having a data analyst in every cell.
• Standardization, Simplified
Units all over the place? Type =ASK_LOCAL1(A2, "Convert this length to meters") and watch AI unify your numbers without a fuss.
This isn’t about bells and whistles – it’s about giving you back control over your time and your data.
A Cast of Characters: Who’s This For?
(Un)Perplexed Spready isn’t a niche player – it’s a hero for anyone who touches data. Here’s how it stars in different stories:
• The Entrepreneur: A startup founder turns raw sales figures into growth plans, extracting key metrics and spotting patterns in minutes.
• The Marketer: A campaign manager categorizes feedback, compares ad performance, and pulls insights – all before lunch.
• The Educator: A teacher organizes student data, standardizes grades, and analyzes progress without losing nights to prep.
• The Logistics Pro: A supply chain expert converts global shipment stats, categorizes cargo, and flags delays with ease.
• You: Whatever your role, if data’s part of your plot, (Un)Perplexed Spready writes a happier ending.
The Plot Twist: Time Becomes Yours Again
The real victory isn’t just cleaner data – it’s what happens next. With (Un)Perplexed Spready, the hours once lost to tedium become yours to reclaim. For Ana, it’s time to dream up new products. For a manager, it’s space to rally the team. For a freelancer, it’s an evening off instead of another late night.
AI isn’t here to dazzle with sci-fi flair – it’s here to lift the weight so you can soar. At Matasoft, we’ve harnessed that power to create a tool that’s as practical as it is profound, blending advanced tech with the spreadsheet simplicity you already know.
Matasoft Crafting the Next Chapter
Matasoft isn’t just coding software – we’re crafting stories of change. (Un)Perplexed Spready is our latest tale, born from a passion to solve real problems and a vision to push productivity forward. We’ve walked the path of spreadsheet struggles, and we’ve paved a way out – not with hype, but with heart and ingenuity.
Write Your Own Ending
Your data has a story to tell – don’t let it stay buried in frustration. With (Un)Perplexed Spready, you’re not just managing numbers – you’re shaping outcomes. Ready to turn the page? Visit https://matasoft.hr/qtrendcontrol/index.php/un-perplexed-spready to explore how it can star in your workflow today.
This is more than a spreadsheet upgrade – it’s a chance to rewrite how you work, one smarter cell at a time. Let’s turn nightmares into dreams, together.
#DataManagement #SpreadsheetRevolution #AIProductivity #DataExtraction #AutomatedAnalytics #WorkSmarter #AITools #BusinessEfficiency #DataAnalytics #FutureOfWork #SpreadsheetAutomation #ProductivityHack #SmartSpreadsheets #DigitalTransformation #AutomationTools #WorkplaceInnovation #DataDriven #DataScience #AI #ArtificialIntelligence #LLM #SpreadsheetSoftware #BusinessIntelligence
Get Started!
Join the revolution today. Let (Un)Perplexed Spready free you from manual data crunching and unlock the full potential of AI—right inside your spreadsheet. Whether you're a business analyst, a researcher, or just an enthusiast, our powerful integration will change the way you work with data.
You can find more practical information on how to setup and use the (Un)Perplexed Spready software here: Using (Un)Perplexed Spready
Download
Download the (Un)Perplexed Spready software: Download (Un)Perplexed Spready
Request Free Evaluation Period
When you run the application, you will be presented with the About form, where you will find automatically generated Machine Code for your computer. Send us an email with specifying your machine code and ask for a trial license. We will send you trial license key, that will unlock the premium AI functions for a limited time period.
Contact us on following email:
Sales Contact
Purchase commercial license
For a price of two beers a month, you can have a faithful co-worker, that is, the AI-whispering spreadsheet software, to work the hard job, while you drink your coffee!.
You can purchase the commercial license here: Purchase License for (Un)Perplexed Spready
AI-driven Spreadsheet Processing Services
We are also offering AI-driven spreadsheet processing services with (Un)Perplexed Spready software.
If you need massive data extraction, data categorization, data classification, data annotation or data labeling, check-out our corresponding services here: AI-driven Spreadsheet Processing Services
Further Reading
Download (Un)Perplexed Spready
Purchase License for (Un)Perplexed Spready
Say Goodbye to Spreadsheet Chaos: (Un)Perplexed Spready Brings AI to the Rescue
- Details
- Written by Super User
- Category: (Un)Perplexed Spready
Say Goodbye to Spreadsheet Chaos: (Un)Perplexed Spready Brings AI to the Rescue
Data is power – until it’s a mess. If you’ve ever spent hours untangling a spreadsheet, wrestling with formulas that won’t cooperate, or manually scrubbing data just to make it usable, you know the struggle is real. At Matasoft, we’ve heard the cries of frustration from small business owners, analysts, and managers alike – and we’ve answered with (Un)Perplexed Spready. This isn’t just another spreadsheet tool; it’s an AI-powered lifeline, built from the ground up to turn data chaos into effortless order.
Powered by advanced AI language models (LLM), (Un)Perplexed Spready takes the familiar grid you know and supercharges it with intelligence that understands, organizes, and analyzes your data like never before. No more late nights fighting with Excel. No more swearing at broken macros. Just a smarter way to work – and a chance to take back your time.
The Problems We All Face
Let’s break it down. Traditional spreadsheets like Excel are brilliant for basic tasks, but they stumble when the stakes get higher. Here are the headaches keeping you up at night – and why they’re so hard to solve the old way:
• Unstructured Data Mess: Product descriptions, customer notes, or supplier lists often come as freeform text – “Deluxe Widget, 16 oz, $25, eco-friendly.” Extracting specifics means tedious manual work or complex string formulas that break too easily.
• Inconsistent Formatting: Units bounce between pounds and kilos, prices mix currencies, and categories are a guessing game. Standardizing it all by hand is a soul-crushing slog.
• Time-Consuming Analysis: Want to compare items or spot trends? You’re stuck cross-referencing rows, building pivot tables, or writing scripts – hours of effort for a glimmer of insight.
• Error-Prone Processes: One typo in a formula, one missed cell, and your whole dataset’s suspect. The more you do manually, the higher the risk.
These aren’t small hiccups – they’re productivity killers. And in a world where data drives decisions, they’re holding you back from what really matters.
The Solution: (Un)Perplexed Spready’s AI Edge
(Un)Perplexed Spready doesn’t patch these problems – it obliterates them. By embedding AI directly into your spreadsheet cells, it transforms how you interact with data. Here’s how it tackles the chaos:
• Extract Data Like a Pro
Got a messy cell? Type =ASK_LOCAL1(A2, "Pull the weight from this description and convert it to grams") and watch “16 oz” become “453.59 g” in seconds. No regex, no fuss – just precision.
• Categorize with Confidence
End the guesswork with =PERPLEXITY1(A2, "Is this furniture, decor, or lighting?"). AI reads the context and sorts it accurately, saving you hours of eyeballing rows.
• Analyze Without the Grind
Need deeper insights? Use =ASK_LOCAL2(A2, B2, "What’s the key difference between these two items?") or =ASK_LOCAL2(A2, B2, "Which has the better price-to-weight ratio?"). Answers flow effortlessly, no pivot tables required.
• Standardize in a Snap
Mixed units or currencies? Type =ASK_LOCAL1(A2, "Convert this price to USD and round it") and let AI unify your data with zero errors. Consistency becomes automatic.
This is AI that doesn’t just talk a big game – it delivers, right where you need it.
Who’s Ready for Rescue?
(Un)Perplexed Spready is built for the real world, not just the tech elite. It’s a game-changer for anyone who’s felt the sting of spreadsheet chaos:
• Business Owners: Turn supplier data into inventory plans fast, leaving time to grow your bottom line.
• Data Analysts: Skip the prep and jump to analysis, uncovering insights that impress clients or bosses.
• Operations Teams: Standardize logistics stats, categorize shipments, and streamline workflows without breaking a sweat.
• Creative Professionals: Organize project budgets or client feedback, freeing your mind for the big ideas.
• Everyday Users: From personal finance to hobby tracking, make data work for you, not against you. If you’ve got data, (Un)Perplexed Spready has your back.
The Payoff: More Than Just Time Saved
Sure, (Un)Perplexed Spready slashes hours off your workload – but the real win is what you do with that freedom. It’s the meeting you nail because you had time to prep. It’s the strategy you craft instead of copying cells. It’s the coffee break you take, knowing the job’s already done. This isn’t about working faster – it’s about living better, with AI as your silent partner.
At Matasoft, we see this as the future: technology that doesn’t overwhelm but empowers, fitting seamlessly into the tools you already use. No coding skills needed, no steep learning curve – just a spreadsheet that finally keeps up with you.
Matasoft: Solving Today, Shaping Tomorrow
Matasoft is on a mission to rethink how we work. (Un)Perplexed Spready is more than a product – it’s our promise to you: less chaos, more clarity. We’ve poured our expertise into every feature, crafting a tool that’s as reliable as it is revolutionary. This is AI with purpose, built by people who get it.
Take Control of Your Data Today
Why let spreadsheets dictate your day when they can elevate it? (Un)Perplexed Spready is your ticket out of the chaos and into a world where data works for you. Ready to see the difference? Visit https://matasoft.hr/qtrendcontrol/index.php/un-perplexed-spready to explore how it can transform your workflow now.
The era of spreadsheet struggles is over. With (Un)Perplexed Spready, you’re not just surviving data – you’re thriving with it. Let’s make chaos a thing of the past, together.
#DataManagement #SpreadsheetRevolution #AIProductivity #DataExtraction #AutomatedAnalytics #WorkSmarter #AITools #BusinessEfficiency #DataAnalytics #FutureOfWork #SpreadsheetAutomation #ProductivityHack #SmartSpreadsheets #DigitalTransformation #AutomationTools #WorkplaceInnovation #DataDriven #DataScience #AI #ArtificialIntelligence #LLM #SpreadsheetSoftware #BusinessIntelligence
Get Started!
Join the revolution today. Let (Un)Perplexed Spready free you from manual data crunching and unlock the full potential of AI—right inside your spreadsheet. Whether you're a business analyst, a researcher, or just an enthusiast, our powerful integration will change the way you work with data.
You can find more practical information on how to setup and use the (Un)Perplexed Spready software here: Using (Un)Perplexed Spready
Download
Download the (Un)Perplexed Spready software: Download (Un)Perplexed Spready
Request Free Evaluation Period
When you run the application, you will be presented with the About form, where you will find automatically generated Machine Code for your computer. Send us an email with specifying your machine code and ask for a trial license. We will send you trial license key, that will unlock the premium AI functions for a limited time period.
Contact us on following email:
Sales Contact
Purchase commercial license
For a price of two beers a month, you can have a faithful co-worker, that is, the AI-whispering spreadsheet software, to work the hard job, while you drink your coffee!.
You can purchase the commercial license here: Purchase License for (Un)Perplexed Spready
AI-driven Spreadsheet Processing Services
We are also offering AI-driven spreadsheet processing services with (Un)Perplexed Spready software.
If you need massive data extraction, data categorization, data classification, data annotation or data labeling, check-out our corresponding services here: AI-driven Spreadsheet Processing Services
Further Reading
Download (Un)Perplexed Spready
Purchase License for (Un)Perplexed Spready
(Un)Perplexed Spready with web search enabled Ollama models
Various Articles about (Un)Perplexed Spready
Why to choose QDeFuZZiner software?
- Details
- Written by Super User
- Category: QDeFuZZiner - Fuzzy Data Matching, Merging and De-duplication software
What is QDeFuZZiner?
QDeFuZZiner is an invaluable tool for anyone looking to perform data matching, merging or de-duplication. Whether you're a data scientist, business analyst, or simply someone looking to make sense of complex data sets, QDeFuZZiner can help you achieve your goals. QDeFuZZiner can help businesses and organizations that rely on large amounts of data by providing fuzzy data matching, data merging, and data de-duplication capabilities.

It is a powerful, yet intuitive software that can identify linked or similar records that contain keyboard errors, missing words, extra words, nicknames, changed surnames, or multicultural name variations. It can also help you to merge and consolidate product and customer lists, from multiple sources, and to identify and link together same entities, such as same customers or products, from two different datasets. Additionally, it can be used to minimize duplicate customer data and accurately link each data record to one customer identity. It also offers a free version called QDeFuZZiner Lite which has all features of the full commercial version, with only limitation of importing maximum 10000 rows per dataset.
Key business benefits of choosing QDeFuZZiner
Organizations in a wide variety of industries can benefit from the use of QDeFuZZiner fuzzy data matching, data merging, and data de-duplication software. This software facilitates the process of accurately identifying, matching, and merging large amounts of data, which can help streamline processes and save businesses time and money.
For example, in the healthcare industry, QDeFuZZiner can help detect and merge duplicated patient records, allowing healthcare providers to maintain a single, comprehensive patient profile and improve patient care.
In the retail industry, QDeFuZZiner can help detect and merge customer data from different sources, allowing businesses to gain a comprehensive view of their customer base and create more personalized marketing campaigns.
In the finance sector, QDeFuZZiner can help identify, validate, and merge data from disparate sources, allowing institutions to accurately detect fraud and money laundering, as well as better manage financial risk.
In the manufacturing industry, QDeFuZZiner can help detect and merge data from different production sources, allowing businesses to quickly and accurately assess production performance and improve efficiency.
Overall, QDeFuZZiner fuzzy data matching, data merging, and data de-duplication software can help a wide variety of businesses and industries save time, streamline their processes, and improve the accuracy and efficiency of their data management.
Main features
The main features of the QDeFuZZiner software include a robust back-end PostgreSQL database, capable of storing, indexing and processing heavy input datasets; an intuitive and interactive front-end desktop GUI application; the ability to import input datasets from spreadsheet and flat (csv) files; intuitive organization of fuzzy data matching projects; intuitive creation of multiple solutions inside each project; interactive user interface for definition of various fuzzy matching parameters; definition of exact matching constraints, fuzzy matching constraints, other constraints; graphical tool for visualization of similarity distribution of matches and non-matches in a solution table; interactive datagrids with integrated searching, filtering, sorting and customization capabilities; integrated spreadsheet software "Spready" for analyzing input datasets and resultsets; and the ability to export resultsets into spreadsheet files (.xlsx, .xls, .ods) or flat files (.csv, .txt, .tab).
The major benefits of using QDeFuZZiner software include lower cost, faster time to market, ability to identify linked or similar records, ability to merge and consolidate product and customer lists, and ability to minimize duplicate customer data and accurately link each data record to one customer identity.
QDeFuZZiner is considered to be one of the best fuzzy data matching software for several reasons:
Advanced Algorithms: QDeFuZZiner uses advanced algorithms that are specifically designed for fuzzy data matching. These algorithms are able to accurately match data even when there are variations in spelling, format, or other inconsistencies.
High Accuracy: QDeFuZZiner delivers high accuracy results, which is essential when working with fuzzy data. This means that users can trust the results produced by the software, which in turn increases the efficiency of their data analysis and decision making.
Intuitive Interface: QDeFuZZiner has an intuitive interface that is easy to use, even for non-technical users. This means that users do not need to have a background in computer science or data analysis to take advantage of the software's capabilities.
Customizability: QDeFuZZiner allows users to customize the data matching process to meet their specific needs. This includes the ability to set various parameters for data matching or de-duplication, with merging capabilities.
Scalability: QDeFuZZiner is able to handle large data sets, making it an ideal solution for organizations of all sizes.
Wide range of industries: QDeFuZZiner offers a number of features that can be used to quickly and easily analyze data in a wide range of industries, including finance, healthcare, retail and more.
All these factors combined make QDeFuZZiner the best fuzzy data matching software in the market, its ability to handle complex data sets with ease, its high accuracy, its easy-to-use interface, its customizability, its scalability and its wide range of industries support make it an ideal solution for organizations and individuals looking to extract insights from data.
What are businesses and jobs that can be helped by QDeFuZZiner?
Capabilities of the QDeFuZZiner software can help to improve the accuracy and efficiency of data management tasks, such as:
-
Customer relationship management (CRM) systems: QDeFuZZiner can help businesses match customer data across different systems and merge duplicate records, improving the accuracy of customer information and reducing the risk of duplicated efforts.
-
Marketing and sales: QDeFuZZiner can help businesses match and merge lead and customer data, improving the targeting and segmentation of marketing campaigns and reducing the risk of duplicated efforts.
-
Supply chain management: QDeFuZZiner can help businesses match and merge supplier and product data, improving the efficiency of procurement and inventory management processes.
-
Human resources: QDeFuZZiner can help businesses match and merge employee data across different systems, improving the accuracy and efficiency of HR processes such as onboarding and performance management.
-
Data analytics: QDeFuZZiner can help businesses clean and prepare data for analysis, improving the accuracy and insights of data-driven decision making.
Jobs that can be helped by QDeFuZZiner are Data Analyst, Data Scientist, Data Engineer, Business Analyst, Marketing Analyst, Sales Analyst, Procurement Analyst, HR Analyst, and others that handle data-related tasks.
Overall, QDeFuZZiner fuzzy data matching, data merging, and data de-duplication software can help a wide variety of businesses and industries save time, streamline their processes, and improve the accuracy and efficiency of their data management.
Download QDeFuZZiner
QDeFuZZiner is the perfect choice for those looking for a reliable, cost-effective and efficient fuzzy data matching, record linkage and data deduplication software. Try it today and experience the power of QDeFuZZiner!
Further Reading
Introduction To Fuzzy Data Matching
Managing QDeFuZZiner Projects
Importing Input Datasets into QDeFuZZiner
Managing QDeFuZZiner Solutions
Demo Fuzzy Match Projects
Various Articles on QDeFuZZiner
Our Data Matching Services
Do you wish us to perform fuzzy data matching, de-duplication or cleansing of your datasets?
Check-out our data matching service here: Data Matching Service
Data Matching Flow
- Details
- Written by Super User
- Category: Fuzzy Data Matching, Record Linkage and Data Deduplication
Data Matching Flow with QDeFuZZiner software
In order to be able to use QDeFuZZiner software successfully, we need to understand general flow of a data matching project.
The same general principles apply to a data de-duplication project as well, which differs only in importing the same original input dataset twice, as both left and right dataset, and setting flag "Deduplication (instead of Matching".
Here is the graphical presentation of a typical data matching project:
Description of major phases involved in data matching or de-duplication project:
Project Creation
First step is to create a new project record.
Input Data Importing
Each project is dealing with matching of two input datasets, called "left dataset" and "right dataset", being imported from .csv files.
In this step you need to register both input datasets and then trigger procedure of their import into QDeFuZZiner database, where further data processing will take place.
In case of a data-deduplication project, the same input dataset has to be registered and imported as both left and right dataset.
QDeFuZZiner software imports only .csv files directly, so if you have your input dataset in other formats, such as Excel spreadsheets, you will need first to export them into corresponding .csv files, in UTF-8 format. Fortunately, all spreadsheet softwares has such option of exporting into .csv files. Our recommendation is to use LibreOffice Calc, which has most versatile options for data exporting.
As a good practice, it is recommended that before importing, you do basic preprocessing of input datasets, such as trimming whitespaces, doing proper capitalization (small and big letters), unified formatting of dates etc. Such data preparations will increase quality of fuzzy data matching.
Also, it is advisable to add a column with unique row identifiers, if not already present. It is always recommended, but for data de-duplication it is in fact a must, because you will need to set-up "<>" operator in exact matching constraints for ID columns of left and right dataset.
Solution Creation and Definition (i.e. setting up data matching model)
After input datasets are imported into QDeFuZZiner database, next step is to create a new Solution and define initial data matching model, which we will polish later.
Adding Columns into Data Matching Constraints
By using Fields Picker tool, we need to add column pairs from left and right datasets into applicable sections, for building our data matching model.
Available sections for adding column pairs are: Exact Matching Relations, Fuzzy Matching Relations, Other Constraints and Merged Columns.
After we added data matching constraints into applicable sections, we are ready to fine-tune our model.
Setting Up Exact Matching Constraints
By default, column pairs added to Exact Matching Constraints will have "=" (equal) operator assigned. However, if we are dealing with data de-duplication project, we need to use "<>" (not equal) operator instead, on ID columns from left and right dataset. That is important, because we don't want to compare a row from original dataset with itself (remember that for data de-duplication project we are importing the same original dataset twice, as both left and right dataset).
Setting Up Fuzzy Matching Constraints
In this section, we need to define relative weights for each columns pair. By default, each column pair gets the same relative weight, i.e. the same importance, which is not optimum.
QDeFuZZiner provide two alternative tools for automatic setting-up recommended relative weights. However, these tools are not perfect and you will need to judge it critically and manually adjust relative weights afterwards. Setting-up perfect relative weights is typically matter of trial and error - you will typically experiment with slight variations of the model, until you get satisfactory result.
Setting Up Other Constraints
This section is used to define additional exact matching constraints on individual columns from left or right dataset.
You will use it if you wish to constrain data model to certain custom sub-range, for example to certain town or gender, etc.
Such constraints must be manually defined.
Setting Up Merged Columns
"Merged Columns" is a very powerful, but complex section, with many parameters and options available, which you can use for creation of additional merged columns in final resultset, but also for merge/consolidation of duplicate rows.
It is important to understand that merging is performed not only horizontally (i.e. accross a matching row), but also vertically (i.e. accross all matching rows for the same matched entity). This is especially important in case of deduplication, where thus you can de-duplicate, while preserving data of all duplicate records, through consolidation options. In other words, you can enrich surviving rows from duplicate rows, during de-duplication process.
Solution Execution
After we defined our initial data matching model, we are ready for execution of the model, in order to retrieve resultset.
Typically, it is a cycle of multiple executions, resultset inspections, data model adjustments and fine-tuning, until you get perfect result.
Once data model is optimized, you can use it for repetitive executions of the same data matching model, with fresh imported data. You just need to re-import new data and execute already saved data matching model.
A) Solution Execution in 3 Consecutive Steps
For initial data model adjustments and fine-tunings, you will use this 3-step approach.
Solutions are saved as records of table of solutions, where each record represents a solution, containing definition of parameters and constraints to be applied. Solution execution actually involves two separate sub-phases, which we call "blocking phase" and "detailed fuzzy match phase". Result of the first phase is so-called "solution base table", while result of the second phase is final resultset.
Blocking phase, i.e. creation of solution base table is a time-consuming operation, which, depending on the datasets size and number of columns included into fuzzy match comparison, can take anything from few minutes to few hours to few days to finish! On the contrary, final resultset creation is executed in matter of seconds or minutes. Therefore, there is much sense to follow this recommended three-step approach: you first define solution parameters and constraints, then create solution base table (blocking phase), then open similarity distribution tool to visually determine area of optimum threshold values, then consecutively vary threshold values (inside previously determined optimum range) and execute detailed fuzzy match phase until getting satisfactory results.
1. Execute Blocking Phase
"Blocking" phase is phase in which a subset of best matching candidate record pairs are chosen from the whole universe of all possible combinations.
Blocking phase is actually sequence of two distinct consecutive sub-phases:
a) Sub-phase of rough similarity filtration (blocking)
By using rough filtration on similarity, best candidates (those matching pairs which have string similarity greater than blocking similarity limit) are passed-through and saved into an intermediate table called "solution base table".
b) Detailed similarity calculation
After best candidates are saved, then detailed similarity calculation takes place for each passed-through record pair.
The most important parameter we need to define for blocking phase is called "blocking similarity limit". This value represents a similarity threshold that is used in so-called "blocking phase". Term "blocking" is used here to designate phase in which Cartesian product of all possible combinations of records from left and right dataset is constrained, i.e. narrowed down to a much smaller subset of combinations, according to some blocking similarity criteria. This is very important sub-phase, because for medium and big datasets, detailed fuzzy match calculation would become infeasible (extremely time consuming), if we would compare and analyze all possible combinations in detailed similarity calculation sub-phase.
The bigger is the blocking similarity limit value, the less number of record pairs will be saved in the solution table and consequently next phase (detailed fuzzy match phase) will be faster. However, if the blocking similarity values is too big, we risk to omit true matches.
When a solution definition is executed, QDeFuZZiner creates a table for a solution, which we call "solution base table". This table is constructed as combination of records from left and right datasets, according to exact and fuzzy matching constraints and blocking similarity limit. Solution base table thus contains subset of left and right dataset records combinations, which satisfy condition of blocking similarity limit. Only combinations saved into the solution table are then analyzed in the detailed fuzzy match sub-phase.
Besides blocking similarity limit, parameter "Use dictionaries (yes/no)" also influences on the solution base table creation. If dictionary is used, strings used for blocking and detailed phase are lexemized into lexems, according to selected dictionary. Lexems are then used for similarity calculation instead of original words. This can be useful in cases of big strings, such as verbose product descriptions, because lexemization decreases variations in related words.
Of course, exact and fuzzy match constraints also influence blocking phase. Adding exact matching constraints can dramatically reduce time for execution of blocking phase and also improve accuracy of fuzzy matching model.
Immediately after rough filtration of candidate record pairs, detailed calculation of string similarity is executed on the passed-through records.
Overall result of the blocking phase is intermediary table stored in the database, called "solution base table", which contains record-pairs with calculated string similarity values.
2. Analyze Similarity Function Distribution
After blocking phase is executed and solution base table is saved in the database, we can investigate similarity function distribution visually, in order to determine appropriate similarity threshold to discern matches from non-matches. QDeFuZZiner provide and advanced tool for similarity function distribution graphical representation, along with mathematical functions trying to provide a clue what would be the optimal threshold.

3. Get Final Resultset
In this phase, value of the "similarity threshold" parameter is used to discern between matches and non-matches. Result of this detailed fuzzy match phase is creation and saving of a resultset table which is then loaded into the datagrid, from which it can be exported into a spreadsheet or flat file.
Besides similarity threshold, this phase is also influenced by exact and fuzzy matching constraints. It is also influenced by the "Join Type" and "Return only best matching record (yes/no)" parameters.
B) Solution Execution in 1 Step
Execution in one step is suitable for re-running a solution on updated (re-imported) input datasets, when you expect that new imported data will not substantially change already defined data matching model.
Resultset Exporting
After a solution is executed, resultset will be saved as a new table in the database and will be presented in a datgrid, from which we can filter, sort, search and export results into a spreadsheet.
Further Reading
Introduction To Fuzzy Data Matching
Managing QDeFuZZiner Projects
Importing Input Datasets into QDeFuZZiner
Managing QDeFuZZiner Solutions
Demo Fuzzy Match Projects
Various Articles on QDeFuZZiner
Our Data Matching Services
Do you wish us to perform fuzzy data matching, de-duplication or cleansing of your datasets?
Check-out our data matching service here: Data Matching Service
Page 2 of 4


