AI Analytics Sports

AI Analytics Sports — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Zardoz (computer security)

    Zardoz (computer security)

    In computer security, the Security-Digest list, better known as the Zardoz list, was a semi-private full disclosure mailing list run by Neil Gorsuch from 1989 through 1991. It identified weaknesses in systems and gave directions on where to find them. It was a perennial target for computer hackers, who sought archives of the list for information on undisclosed software vulnerabilities. == Membership restrictions == Access to Zardoz was approved on a case-by-case basis by Gorsuch, principally by reference to the user account used to send subscription requests; requests were approved for root users, valid UUCP owners, or system administrators listed at the NIC. The openness of the list to users other than Unix system administrators was a regular topic of conversation, with participants expressing concern that vulnerabilities and exploitation details disclosed on the list were liable to spread to hackers. The circulation of Zardoz postings was an open secret among computer hackers, and mocked in a Phrack parody of an IRC channel populated by security experts. == Notable participants == Keith Bostic discussed BSD Sendmail vulnerabilities Chip Salzenberg discussed Peter Honeyman's posting of a UUCP worm, and shell script security Gene Spafford discussed VMS and Ultrix bugs, and relayed law enforcement enquiries about the Morris Worm Tom Christiansen discussed SUID shell scripts Chris Torek discussed devising exploits from general descriptions of vulnerabilities Henry Spencer discussed Unix security Brendan Kehoe discussed systems security Alec Muffett announced Crack, the Unix password cracker The majority of Zardoz participants were Unix systems administrators and C software developers. Neil Gorsuch and Gene Spafford were the most prolific contributors to the list.

    Read more →
  • AI Writing Assistants: Free vs Paid (2026)

    AI Writing Assistants: Free vs Paid (2026)

    Curious about the best AI writing assistant? An AI writing assistant is software that uses machine learning to help you get more done — it combines speed, accuracy, and an interface that just works. Hands-on testing shows real-world results vary, so a short free trial is the smartest way to decide. Whether you are a beginner or a pro, the right AI writing assistant slots into your workflow and pays for itself fast. This guide breaks down the top picks, their pros and cons, and who each one is best for.

    Read more →
  • IBM alignment models

    IBM alignment models

    The IBM alignment models are a sequence of increasingly complex models used in statistical machine translation to train a translation model and an alignment model, starting with lexical translation probabilities and moving to reordering and word duplication. They underpinned the majority of statistical machine translation systems for almost twenty years starting in the early 1990s, until neural machine translation began to dominate. These models offer principled probabilistic formulation and (mostly) tractable inference. The IBM alignment models were published in parts in 1988 and 1990, and the entire series is published in 1993. Every author of the 1993 paper subsequently went to the hedge fund Renaissance Technologies. The original work on statistical machine translation at IBM proposed five models, and a model 6 was proposed later. The sequence of the six models can be summarized as: Model 1: lexical translation Model 2: additional absolute alignment model Model 3: extra fertility model Model 4: added relative alignment model Model 5: fixed deficiency problem. Model 6: Model 4 combined with a HMM alignment model in a log linear way == Mathematical setup == The IBM alignment models translation as a conditional probability model. For each source-language ("foreign") sentence f {\displaystyle f} , we generate both a target-language ("English") sentence e {\displaystyle e} and an alignment a {\displaystyle a} . The problem then is to find a good statistical model for p ( e , a | f ) {\displaystyle p(e,a|f)} , the probability that we would generate English language sentence e {\displaystyle e} and an alignment a {\displaystyle a} given a foreign sentence f {\displaystyle f} . The meaning of an alignment grows increasingly complicated as the model version number grew. See Model 1 for the most simple and understandable version. == Model 1 == === Word alignment === Given any foreign-English sentence pair ( e , f ) {\displaystyle (e,f)} , an alignment for the sentence pair is a function of type { 1 , . , . . . , l e } → { 0 , 1 , . , . . . , l f } {\displaystyle \{1,.,...,l_{e}\}\to \{0,1,.,...,l_{f}\}} . That is, we assume that the English word at location i {\displaystyle i} is "explained" by the foreign word at location a ( i ) {\displaystyle a(i)} . For example, consider the following pair of sentences It will surely rain tomorrow -- 明日 は きっと 雨 だWe can align some English words to corresponding Japanese words, but not everyone:it -> ? will -> ? surely -> きっと rain -> 雨 tomorrow -> 明日This in general happens due to the different grammar and conventions of speech in different languages. English sentences require a subject, and when there is no subject available, it uses a dummy pronoun it. Japanese verbs do not have different forms for future and present tense, and the future tense is implied by the noun 明日 (tomorrow). Conversely, the topic-marker は and the grammar word だ (roughly "to be") do not correspond to any word in the English sentence. So, we can write the alignment as 1-> 0; 2 -> 0; 3 -> 3; 4 -> 4; 5 -> 1where 0 means that there is no corresponding alignment. Thus, we see that the alignment function is in general a function of type { 1 , . , . . . , l e } → { 0 , 1 , . , . . . , l f } {\displaystyle \{1,.,...,l_{e}\}\to \{0,1,.,...,l_{f}\}} . Future models will allow one English world to be aligned with multiple foreign words. === Statistical model === Given the above definition of alignment, we can define the statistical model used by Model 1: Start with a "dictionary". Its entries are of form t ( e i | f j ) {\displaystyle t(e_{i}|f_{j})} , which can be interpreted as saying "the foreign word f j {\displaystyle f_{j}} is translated to the English word e i {\displaystyle e_{i}} with probability t ( e i | f j ) {\displaystyle t(e_{i}|f_{j})} ". After being given a foreign sentence f {\displaystyle f} with length l f {\displaystyle l_{f}} , we first generate an English sentence length l e {\displaystyle l_{e}} uniformly in a range U n i f o r m [ 1 , 2 , . . . , N ] {\displaystyle Uniform[1,2,...,N]} . In particular, it does not depend on f {\displaystyle f} or l f {\displaystyle l_{f}} . Then, we generate an alignment uniformly in the set of all possible alignment functions { 1 , . , . . . , l e } → { 0 , 1 , . , . . . , l f } {\displaystyle \{1,.,...,l_{e}\}\to \{0,1,.,...,l_{f}\}} . Finally, for each English word e 1 , e 2 , . . . e l e {\displaystyle e_{1},e_{2},...e_{l_{e}}} , generate each one independently of every other English word. For the word e i {\displaystyle e_{i}} , generate it according to t ( e i | f a ( i ) ) {\displaystyle t(e_{i}|f_{a(i)})} . Together, we have the probability p ( e , a | f ) = 1 / N ( 1 + l f ) l e ∏ i = 1 l e t ( e i | f a ( i ) ) {\displaystyle p(e,a|f)={\frac {1/N}{(1+l_{f})^{l_{e}}}}\prod _{i=1}^{l_{e}}t(e_{i}|f_{a(i)})} IBM Model 1 uses very simplistic assumptions on the statistical model, in order to allow the following algorithm to have closed-form solution. === Learning from a corpus === If a dictionary is not provided at the start, but we have a corpus of English-foreign language pairs { ( e ( k ) , f ( k ) ) } k {\displaystyle \{(e^{(k)},f^{(k)})\}_{k}} (without alignment information), then the model can be cast into the following form: fixed parameters: the foreign sentences { f ( k ) } k {\displaystyle \{f^{(k)}\}_{k}} . learnable parameters: the entries of the dictionary t ( e i | f j ) {\displaystyle t(e_{i}|f_{j})} . observable variables: the English sentences { e ( k ) } k {\displaystyle \{e^{(k)}\}_{k}} . latent variables: the alignments { a ( k ) } k {\displaystyle \{a^{(k)}\}_{k}} In this form, this is exactly the kind of problem solved by expectation–maximization algorithm. Due to the simplistic assumptions, the algorithm has a closed-form, efficiently computable solution, which is the solution to the following equations: { max t ′ ∑ k ∑ i ∑ a ( k ) t ( a ( k ) | e ( k ) , f ( k ) ) ln ⁡ t ( e i ( k ) | f a ( k ) ( i ) ( k ) ) ∑ x t ′ ( e x | f y ) = 1 ∀ y {\displaystyle {\begin{cases}\max _{t'}\sum _{k}\sum _{i}\sum _{a^{(k)}}t(a^{(k)}|e^{(k)},f^{(k)})\ln t(e_{i}^{(k)}|f_{a^{(k)}(i)}^{(k)})\\\sum _{x}t'(e_{x}|f_{y})=1\quad \forall y\end{cases}}} This can be solved by Lagrangian multipliers, then simplified. For a detailed derivation of the algorithm, see chapter 4 and. In short, the EM algorithm goes as follows:INPUT. a corpus of English-foreign sentence pairs { ( e ( k ) , f ( k ) ) } k {\displaystyle \{(e^{(k)},f^{(k)})\}_{k}} INITIALIZE. matrix of translations probabilities t ( e x | f y ) {\displaystyle t(e_{x}|f_{y})} .This could either be uniform or random. It is only required that every entry is positive, and for each y {\displaystyle y} , the probability sums to one: ∑ x t ( e x | f y ) = 1 {\displaystyle \sum _{x}t(e_{x}|f_{y})=1} . LOOP. until t ( e x | f y ) {\displaystyle t(e_{x}|f_{y})} converges: t ( e x | f y ) ← t ( e x | f y ) λ y ∑ k , i , j δ ( e x , e i ( k ) ) δ ( f y , f j ( k ) ) ∑ j ′ t ( e i ( k ) | f j ′ ( k ) ) {\displaystyle t(e_{x}|f_{y})\leftarrow {\frac {t(e_{x}|f_{y})}{\lambda _{y}}}\sum _{k,i,j}{\frac {\delta (e_{x},e_{i}^{(k)})\delta (f_{y},f_{j}^{(k)})}{\sum _{j'}t(e_{i}^{(k)}|f_{j'}^{(k)})}}} where each λ y {\displaystyle \lambda _{y}} is a normalization constant that makes sure each ∑ x t ( e x | f y ) = 1 {\displaystyle \sum _{x}t(e_{x}|f_{y})=1} .RETURN. t ( e x | f y ) {\displaystyle t(e_{x}|f_{y})} .In the above formula, δ {\displaystyle \delta } is the Dirac delta function -- it equals 1 if the two entries are equal, and 0 otherwise. The index notation is as follows: k {\displaystyle k} ranges over English-foreign sentence pairs in corpus; i {\displaystyle i} ranges over words in English sentences; j {\displaystyle j} ranges over words in foreign language sentences; x {\displaystyle x} ranges over the entire vocabulary of English words in the corpus; y {\displaystyle y} ranges over the entire vocabulary of foreign words in the corpus. === Limitations === There are several limitations to the IBM model 1. No fluency: Given any sentence pair ( e , f ) {\displaystyle (e,f)} , any permutation of the English sentence is equally likely: p ( e | f ) = p ( e ′ | f ) {\displaystyle p(e|f)=p(e'|f)} for any permutation of the English sentence e {\displaystyle e} into e ′ {\displaystyle e'} . No length preference: The probability of each length of translation is equal: ∑ e has length l p ( e | f ) = 1 N {\displaystyle \sum _{e{\text{ has length }}l}p(e|f)={\frac {1}{N}}} for any l ∈ { 1 , 2 , . . . , N } {\displaystyle l\in \{1,2,...,N\}} . Does not explicitly model fertility: some foreign words tend to produce a fixed number of English words. For example, for German-to-English translation, ja is usually omitted, and zum is usually translated to one of to the, for the, to a, for a. == Model 2 == Model 2 allows alignment to be conditional on sentence lengths. That is, we have a probability distribution p a ( j | i , l e , l f ) {\displaystyle

    Read more →
  • General Internet Corpus of Russian

    General Internet Corpus of Russian

    General Internet Corpus of Russian (GICR) is a corpus of Russian internet texts that has been accessible on request through an online query interface since 2013. The corpus includes rich text materials from the blogosphere, social networks, major news sources and literary magazines. == Goals of the project == The project has the status of an educational and scientific one, and many tasks of computational linguistics are solved by independent researchers and research groups with the materials obtained by GICR. While other corpus projects of Russian are focused on fiction and edited texts, General Internet Corpus provides linguists timely opportunity to learn the language as it is, with all the slang and regional peculiarities. Corpus gives the opportunity to carry out research in Linguistic research of a wide range: dialectological research, study of word distribution, study of the language of the social networks, study of the influence of gender, age and other factors on the language, frequency of words, fixed expressions and different constructions, stylistic features of texts of different segments of the Internet, etc. Social media analysis Corpus-based machine learning for evaluating automatic tagging At various times, student papers and independent researches were carried out on the project material by students, graduates and employees of MSU, MIPT, Russian State Humanitarian University, Novosibirsk State University, Higher School of Economics, Russian Academy of Sciences, SFU, CSU, SGMP, IAAS of MSU. Scientific project leaders: Belikov V. - RSUH, Moscow, Russia Selegey V. - RSUH, ABBYY, Moscow, Russia Sharoff S. - RSUH, Moscow, Russia; University of Leeds, UK The organizations involved in support of GICR: Russian State University of Humanities ABBYY Company Moscow Institute of Physics and Technology Skolkovo Institute of Science and Technology == Size and content of the corpus == Corpus size for the summer 2016 is 19.8 billion tokens, of which 49% are from VKontakte, 40% are from LiveJournal, another 4% - from Mail.ru Blogs and News, and 2% - from Russian Magazine Hall. The sources collected in news segment are: RIA Novosti, Regnum, Lenta.ru, Rosbalt. Texts are provided with metamarkup (by date of creation of the text, sex, place and year of birth of the author, Internet genre, etc.); all texts are provided with automatic morphological tagging and lemmatization. Most of the texts collected are of 2013–2014 years of creation, although in some segments, such as in Russian Magazine Hall, there are some texts collected since 1994. GICR is one of the few mega-corpora projects nowadays, which means its available size is reaching several billion of words. == Access == Currently the interface of GICR is in beta stage, so access to the search in the corpora is provided and is free, but is available for researchers on request.

    Read more →
  • MSpy

    MSpy

    mSpy is a brand of mobile and computer parental control monitoring software for iOS, Android, Windows, and macOS. The app monitors and logs user activity on the client device and sends the data to a personalized dashboard. Data the users can monitor includes text messages, calls, GPS locations, social media chats, and more. It is owned by Virtuoso Holding. == History == mSpy was launched as a product for mobile monitoring by Altercon Group in 2010. In 2012, the application allowed parents to monitor not only smartphones but also computers running Windows and macOS. In 2013, mSpy became TopTenReviews cell phone monitoring software award winner. By 2014, the business grew nearly 400%, and the app's user numbers exceeded 1 million. In 2015, mSpy received the Parents Tested Parents Approved (PTPA) Winner’s Seal of Approval in the United States. In 2015 and 2018, mSpy was the victim of data breaches which released user data. In 2016, mLite, a light version of mSpy, became available from Google Play. The same year, it was awarded the kidSAFE Certified Seal in the United States. In 2017, mSpy collaborated with YouTuber and journalist Coby Persin to conduct a social experiment on the dangers of social media and online predators. A social experiment, conducted with parental consent, involved Coby Persin to befriend three children—aged 12, 13, and 14—via Snapchat and then invite them to meet personally. Each of the participants agreed to the meeting and arrived at the designated location. The video of the experiment received widespread attention and helped to raise awareness about the importance of online security and parental controls. In early 2021, mSpy released a new feature - Screenrecorder. The feature allows parents to take screenshots of the kid's screen when they are browsing certain apps. In 2024, mSpy's Zendesk was compromised by an unknown threat actor, revealing their customer list. As of 2025, mSpy is compatible with Android, iPhone, and iPad devices. It provides access to various types of data stored on the device, including contact information, calendar entries, emails, SMS messages, browser history, photos, videos, and installed applications. Functions also include GPS tracking, geofencing, keyword alerts etc. == Reception == It was noted that since MSpy runs inconspicuously, there is risk of the software being used illegally. mSpy was called "terrifying" by The Next Web and was featured in NPR coverage of spyware used against victims of stalking and other domestic violence. In response mSpy released security updates aimed at reducing the risk of misuse and stated that it "uses encryption protocols to protect user data and that access is restricted to the account holder". In May 2015, Brian Krebs reported that mSpy was hacked, leaking personal data for hundreds of thousands of users of devices with mSpy installed. mSpy claimed that there was no data leak, but that instead, it was the victim of blackmailers. In September 2018, Krebs claimed and demonstrated that anyone could easily gain access to the mSpy database containing data for millions of users. The company responded by stating that the exposed data consisted primarily of error logs and incorrect login attempts. Following the incident, mSpy implemented new security measures, changed encryption keys, and reset passwords for affected accounts. A 2024 Sky News story characterised mSpy as "stalkerware". Leaked customer support messages from mSpy reveal misuse of its app for illegally monitoring partners and children.

    Read more →
  • The Best Free AI Text-to-image Tool for Beginners

    The Best Free AI Text-to-image Tool for Beginners

    Looking for the best AI text-to-image tool? An AI text-to-image tool is software that uses machine learning to help you get more done — it can save you hours every week by automating repetitive work. Most options offer a generous free tier, with paid plans unlocking higher limits, faster processing, and team features. Whether you are a beginner or a pro, the right AI text-to-image tool slots into your workflow and pays for itself fast. This guide breaks down the top picks, their pros and cons, and who each one is best for.

    Read more →
  • Deterministic acyclic finite state automaton

    Deterministic acyclic finite state automaton

    In computer science, a deterministic acyclic finite state automaton (DAFSA), is a data structure that represents a set of strings, and allows for a query operation that tests whether a given string belongs to the set in time proportional to its length. Algorithms exist to construct and maintain such automata, while keeping them minimal. DAFSA is the rediscovery of a data structure called Directed Acyclic Word Graph (DAWG), although the same name had already been given to a different data structure which is related to suffix automaton. A DAFSA is a special case of a finite state recognizer that takes the form of a directed acyclic graph with a single source vertex (a vertex with no incoming edges), in which each edge of the graph is labeled by a letter or symbol, and in which each vertex has at most one outgoing edge for each possible letter or symbol. The strings represented by the DAFSA are formed by the symbols on paths in the graph from the source vertex to any sink vertex (a vertex with no outgoing edges). In fact, a deterministic finite state automaton is acyclic if and only if it recognizes a finite set of strings. == History == Blumer et al first defined terminology Directed Acyclic Word Graph (DAWG) in 1983. Appel and Jacobsen used the same naming for a different data structure in 1988. Independent of earlier work, Daciuk et al rediscovered the latter data structure in 2000 but called it DAFSA. == Comparison to tries == By allowing the same vertices to be reached by multiple paths, a DAFSA may use significantly fewer vertices than the strongly related trie data structure. Consider, for example, the four English words "tap", "taps", "top", and "tops". A trie for those four words would have 12 vertices, one for each of the strings formed as a prefix of one of these words, or for one of the words followed by the end-of-string marker. However, a DAFSA can represent these same four words using only six vertices vi for 0 ≤ i ≤ 5, and the following edges: an edge from v0 to v1 labeled "t", two edges from v1 to v2 labeled "a" and "o", an edge from v2 to v3 labeled "p", an edge v3 to v4 labeled "s", and edges from v3 and v4 to v5 labeled with the end-of-string marker. There is a tradeoff between memory and functionality, because a standard DAFSA can tell you if a word exists within it, but it cannot point you to auxiliary information about that word, whereas a trie can. The primary difference between DAFSA and trie is the elimination of suffix and infix redundancy in storing strings. The trie eliminates prefix redundancy since all common prefixes are shared between strings, such as between doctors and doctorate the doctor prefix is shared. In a DAFSA common suffixes are also shared, for words that have the same set of possible suffixes as each other. For dictionary sets of common English words, this translates into major memory usage reduction. Because the terminal nodes of a DAFSA can be reached by multiple paths, a DAFSA cannot directly store auxiliary information relating to each path, e.g. a word's frequency in the English language. However, if for each node we store the number of unique paths through that point in the structure, we can use it to retrieve the index of a word, or a word given its index. The auxiliary information can then be stored in an array.

    Read more →
  • AI Code Generators: Free vs Paid (2026)

    AI Code Generators: Free vs Paid (2026)

    Looking for the best AI code generator? An AI code generator is software that uses machine learning to help you get more done — it can save you hours every week by automating repetitive work. Most options offer a generous free tier, with paid plans unlocking higher limits, faster processing, and team features. Whether you are a beginner or a pro, the right AI code generator slots into your workflow and pays for itself fast. Read on for hands-on impressions, pricing tiers, and the standout features that matter.

    Read more →
  • Test data

    Test data

    Test data are sets of inputs or information used to verify the correctness, performance, and reliability of software systems. Test data encompass various types, such as positive and negative scenarios, edge cases, and realistic user scenarios, and aims to exercise different aspects of the software to uncover bugs and validate its behavior. Test data is also used in regression testing to verify that new code changes or enhancements do not introduce unintended side effects or break existing functionalities. == Background == Test data may be used to verify that a given set of inputs to a function produces an expected result. Alternatively, data can be used to challenge the program's ability to handle unusual, extreme, exceptional, or unexpected inputs. Test data can be produced in a focused or systematic manner, as is typically the case in domain testing, or through less focused approaches, such as high-volume randomized automated tests. Test data can be generated by the tester or by a program or function that assists the tester. It can be recorded for reuse or used only once. Test data may be created manually, using data generation tools (often based on randomness), or retrieved from an existing production environment. The data set may consist of synthetic (fake) data, but ideally, it should include representative (real) data. == Limitations == Due to privacy regulations such as GDPR, PCI, and the HIPAA, the use of privacy-sensitive personal data for testing is restricted. However, anonymized (and preferably subsetted) production data may be used as representative data for testing and development. Programmers may also choose to generate synthetic data as an alternative to using real or anonymized data. While synthetic data can offer significant advantages, such as enhanced privacy and flexibility, it also comes with limitations. For instance, generating synthetic data that accurately reflects real-world complexity can be challenging. There is also a risk of synthetic data not fully capturing the nuances of real data, potentially leading to gaps in test coverage. == Domain testing == Domain testing is a set of techniques focusing on test data. This includes identifying critical inputs, values at the boundaries between equivalence classes, and combinations of inputs that drive the system toward specific outputs. Domain testing helps ensure that various scenarios are effectively tested, including edge cases and unusual conditions.

    Read more →
  • How to Choose an AI Video Generator

    How to Choose an AI Video Generator

    Looking for the best AI video generator? An AI video generator is software that uses machine learning to help you get more done — it can save you hours every week by automating repetitive work. Most options offer a generous free tier, with paid plans unlocking higher limits, faster processing, and team features. Whether you are a beginner or a pro, the right AI video generator slots into your workflow and pays for itself fast. This guide breaks down the top picks, their pros and cons, and who each one is best for.

    Read more →
  • AI Website Builders: Free vs Paid (2026)

    AI Website Builders: Free vs Paid (2026)

    Looking for the best AI website builder? An AI website builder is software that uses machine learning to help you get more done — it can save you hours every week by automating repetitive work. Most options offer a generous free tier, with paid plans unlocking higher limits, faster processing, and team features. Whether you are a beginner or a pro, the right AI website builder slots into your workflow and pays for itself fast. This guide breaks down the top picks, their pros and cons, and who each one is best for.

    Read more →
  • Geoffrey J. Gordon

    Geoffrey J. Gordon

    Geoffrey J. Gordon is a professor at the Machine Learning Department at Carnegie Mellon University in Pittsburgh and director of research at the Microsoft Montréal lab. He is known for his research in statistical relational learning (a subdiscipline of artificial intelligence and machine learning) and on anytime dynamic variants of the A search algorithm. His research interests include multi-agent planning, reinforcement learning, decision-theoretic planning, statistical models of difficult data (e.g. maps, video, text), computational learning theory, and game theory. Gordon received a B.A. in computer science from Cornell University in 1991, and a PhD at Carnegie Mellon in 1999.

    Read more →
  • Cheekd

    Cheekd

    Cheekd is a dating app based in New York City. It was founded in 2010 by Lori Cheek. == History == The service debuted with the name "Cheek'd". Founder Lori Cheek appeared on the television program, Shark Tank in February 2014, but did not succeed in obtaining funding from any of the five judges. She said Cheek’d only had 1000 subscribers at that time. === Business card model === Cheek'd offered two plans, paid and free. For $25, subscribers got a set of 50 business cards that could be given out once someone caught their eye. Each card had a phrase, an online code, and a URL to the subscriber's account. Recipients could look up the giver's profile. In addition to purchasing cards, there was a $9.95 monthly membership fee. === Smartphone app === In 2015, the service's name changed from "Cheek'd" to "Cheekd". The new app used Bluetooth technology to alert users whenever a compatible user was within a 30-foot radius, instead of using cards. == Patent lawsuit == The original business card-based model for Cheekd had been claimed as a patented process by Lori Cheek, as U.S. patent 8,543,465. In September 2017, a complaint was filed, alleging that the idea was not original to Lori Cheek. Cheek responded, stating that the complaint was baseless, and a complete fabrication. The lawsuit Pirri v. Cheek was dismissed in a pre-trial conference in New York's Federal Court on April 5, 2018.

    Read more →
  • Maghi King

    Maghi King

    Margaret (Maghi) Daniel King is a retired British computational linguist known for her work on evaluating the quality of machine translation. She is an honorary professor in the Department of Translation Technology of the University of Geneva in Switzerland, and the former director of the Dalle Molle Institute for Semantic and Cognitive Studies at the University of Geneva. == Education and career == King read classics, Ancient History and Philosophy (Greats) at the University of Oxford, worked as a computer programmer, and became a lecturer in the Department of Computation at the University of Manchester Institute of Science and Technology. She moved to the Dalle Molle Institute for Semantic and Cognitive Studies (ISSCO) in 1974. In 1976, ISSCO became part of the University of Geneva, and she continued there, becoming ISSCO's director in 1978. She remained director until her retirement in 2006. == Recognition == King is a Fellow of the European Association for Artificial Intelligence (formerly ECCAI), elected in 1999.

    Read more →
  • Trie

    Trie

    In computer science, a trie (, ), also known as a digital tree or prefix tree, is a specialized search tree data structure used to store and retrieve strings from a dictionary or set. Unlike a binary search tree, nodes in a trie do not store their associated key. Instead, each node's position within the trie determines its associated key, with the connections between nodes defined by individual characters rather than the entire key. Tries are particularly effective for tasks such as autocomplete, spell checking, and IP routing, offering advantages over hash tables due to their prefix-based organization and lack of hash collisions. Every child node shares a common prefix with its parent node, and the root node represents the empty string. While basic trie implementations can be memory-intensive, various optimization techniques such as compression and bitwise representations have been developed to improve their efficiency. A notable optimization is the radix tree, which provides more efficient prefix-based storage. While tries store character strings, they can be adapted to work with any ordered sequence of elements, such as permutations of digits or shapes. A notable variant is the bitwise trie, which uses individual bits from fixed-length binary data (such as integers or memory addresses) as keys. == History, etymology, and pronunciation == The idea of a trie for representing a set of strings was first abstractly described by Axel Thue in 1912. Tries were first described in a computer context by René de la Briandais in 1959. The idea was independently described in 1960 by Edward Fredkin, who coined the term trie, pronouncing it (as "tree"), after the middle syllable of retrieval. However, other authors pronounce it (as "try"), in an attempt to distinguish it verbally from "tree". == Overview == Tries are a form of string-indexed look-up data structure, which is used to store a dictionary list of words that can be searched on in a manner that allows for efficient generation of completion lists. A prefix trie is an ordered tree data structure used in the representation of a set of strings over a finite alphabet set, which allows efficient storage of words with common prefixes. Tries can be efficacious on string-searching algorithms such as predictive text, approximate string matching, and spell checking in comparison to binary search trees. A trie can be seen as a tree-shaped deterministic finite automaton. == Operations == Tries support various operations: insertion, deletion, and lookup of a string key. Tries are composed of nodes that contain links, which either point to other suffix child nodes or null. As for every tree, each node except the root is pointed to by only one other node, called its parent. Each node contains as many links as the number of characters in the applicable alphabet (although tries tend to have a substantial number of null links). In some cases, the alphabet used is simply that of the character encoding—resulting in, for example, a size of 128 in the case of ASCII. The null links within the children of a node emphasize the following characteristics: Characters and string keys are implicitly stored in the trie, and include a character sentinel value indicating string termination. Each node contains one possible link to a prefix of strong keys of the set. A basic structure type of nodes in the trie is as follows: Node {\displaystyle {\text{Node}}} may contain an optional Value {\displaystyle {\text{Value}}} , which is associated with the key that corresponds to the node. === Searching === Searching for a value in a trie is guided by the characters in the search string key, as each node in the trie contains a corresponding link to each possible character in the given string. Thus, following the string within the trie yields the associated value for the given string key. A null link during the search indicates the inexistence of the key. The following pseudocode implements the search procedure for a given string key in a rooted trie x. In the above pseudocode, x and key correspond to the pointer of the trie's root node and the string key, respectively. The search operation takes O ( m ) {\displaystyle O(m)} time, where m {\displaystyle m} is the size of the string parameter key. In a balanced binary search tree, on the other hand, it takes O ( m log ⁡ n ) {\displaystyle O(m\log n)} time, in the worst case, since key needs to be compared with O ( log ⁡ n ) {\displaystyle O(\log n)} other keys and each comparison takes O ( m ) {\displaystyle O(m)} time, in the worst case. The trie occupies less space, in comparison with a binary search tree, in the case of a large number of short strings, since nodes share common initial string subsequences and store the keys implicitly. === Insertion === Insertion into a trie is guided by using the character sets as indexes to the children array until the last character of the string key is reached. Each node in the trie corresponds to one call of the radix sorting routine, as the trie structure reflects the execution pattern of the top-down radix sort. If null links are encountered before reaching the last character of the string key, new nodes are created. The input value is assigned to the value of the last node traversed, which is the node that corresponds to the key. === Deletion === Deletion of a key–value pair from a trie involves finding the node corresponding to the key, setting its value to null, and recursively removing nodes that have no children. The procedure begins by examining key; an empty string indicates arrival at the node corresponding to the (original) key, in which case its value is set to null. If the node, then, has null value and no children, it is removed from the trie by returning null; otherwise, the node is kept by returning the node itself. == Replacing other data structures == === Replacement for hash tables === A trie can be used to replace a hash table, over which it has the following advantages: Searching for a node with an associated key of size m {\displaystyle m} has the complexity of O ( m ) {\displaystyle O(m)} , whereas an imperfect hash function may have numerous colliding keys, and the worst-case lookup speed of such a table would be O ( N ) {\displaystyle O(N)} , where N {\displaystyle N} denotes the total number of nodes within the table. Tries do not need a hash function for the operation, unlike a hash table; there are also no collisions of different keys in a trie. Within a trie, keys can be efficiently sorted lexicographically. However, tries are less efficient than a hash table when the data is directly accessed on a secondary storage device such as a hard disk drive that has higher random access time than the main memory. == Implementation strategies == Tries can be represented in several ways, corresponding to different trade-offs between memory use and speed of the operations. Using a vector of pointers for representing a trie consumes enormous space; however, memory space can be reduced at the expense of running time if a singly linked list is used for each node vector, as most entries of the vector contains nil {\displaystyle {\text{nil}}} . Techniques such as alphabet reduction may reduce the large space requirements by reinterpreting the original string as a longer string over a smaller alphabet. For example, a string of n bytes can alternatively be regarded as a string of 2n four-bit units. This can reduce memory usage by a factor of eight; but lookups need to visit twice as many nodes in the worst case. Another technique includes storing a vector of 256 ASCII pointers as a bitmap of 256 bits representing ASCII alphabet, which reduces the size of individual nodes dramatically. === Bitwise tries === Bitwise tries are used to address the enormous space requirement for the trie nodes in a naive simple pointer vector implementations. Each character in the string key set is represented via individual bits, which are used to traverse the trie over a string key. The implementations for these types of trie use vectorized CPU instructions to find the first set bit in a fixed-length key input (e.g. GCC's __builtin_clz() intrinsic function). Accordingly, the set bit is used to index the first item, or child node, in the 32- or 64-entry based bitwise tree. Search then proceeds by testing each subsequent bit in the key. This procedure is also cache-local and highly parallelizable due to register independency, and thus performant on out-of-order execution CPUs. === Compressed tries === Radix tree, also known as a compressed trie, is a space-optimized variant of a trie in which any node with only one child gets merged with its parent; elimination of branches of the nodes with a single child results in better metrics in both space and time. This works best when the trie remains static and set of keys stored are very sparse within their representation space. One more approach for static tries is to "pack" the trie by storing disjoint

    Read more →