Wednesday, November 16, 2016

Google translate in the wrong direction

No product announcement is what it seems. 

Google Translate, with some fanfare, announced that it's "improving more in a single leap than we’ve seen in the last ten years combined". They provide no sense of how they came to that figure, of course. 

What's worse: they are, like most AI researchers, obsessed by the application of particular technologies, rather than any progress in the understanding of the human brain. "Neural Machines" are just engineering tools. They are inventions. They have nothing to do with human language, and are not "closer" to human language than whatever text-matching formal grammar tool they were using before. They willfully ignore cognitive science, biology, natural science, and linguistics. So their approach will always be wrong. And it seems more wrong now, because they think they are more right.

It's hard to see the difference in "product quality", because they don't let you compare the old product to the new one. 

But let's give it the oldest test in the book. If I take an English paragraph from their press release, translate it into French, and then translate it back into English ... this "new approach" provides grammatically-correct gibberish.

“Whereas Phrase-Based Machine Translation (PBMT) breaks an input sentence into words and phrases to be translated largely independently, Neural Machine Translation (NMT) considers the entire input sentence as a unit for translation. The advantage of this approach is that it requires fewer engineering design choices than previous Phrase-Based translation systems.”

"Alors que la traduction machine à base de phrases (PBMT) casse une phrase d'entrée en mots et phrases à traduire en grande partie indépendamment, la traduction machine neuronale (NMT) considère la phrase d'entrée entière comme une unité de traduction. L'avantage de cette approche est qu'elle nécessite moins de choix d'ingénierie que les systèmes de traduction basés sur des expressions antérieures. "

"While machine-based machine translation (PBMT) breaks an input sentence into words and phrases to be translated largely independently, neural machine translation (NMT) considers the entire input sentence as a translation unit. The advantage of this approach is that it requires fewer engineering choices than translation systems based on earlier expressions. "

I think it's obvious that Google is still vulnerable to any competitor who takes natural science seriously. This is true in both translation and in search. 

Saturday, November 06, 2010


Although I use Google Translate all the time, it's a shame that Google is still pursuing an approach to natural language processing that should have been abandoned in the late 1950's, after the publication of Syntactic Structures.

Google is stuck in the idea that a corpus, minced through a decision procedure, can somehow provide automated natural language capabilities, before we've successfully understood the biology of language! It's like assuming that lots of observations of objects moving will somehow provide you a theory of gravitation.

As a result, Google Translate just doesn't work. The reason it's still useful is that we have this rich biological mechanism available to us, mostly subconscious, that can make corrections and fill in the gaps.

Type anything into Google Translate, then translate back the results. It's very rare that you'll get anything equivalent to your original. It has something like a 99% failure rate, for me.

So I type:

I wonder if this will ever work?

Google Translate renders this into French:

Je me demande si cela va fonctionner?

... which is already wrong, but to complete the exercise, switch the translation and you get:

I wonder if this will work?

Don't get me wrong: Google Translate is a useful tool. But, honestly, Google, you cannot get from here to real automated translation, if you're relying upon the techniques of statistical analysis. You need a real computational theory of language, a device that, for language L, to quote Chomsky in 1956, "generates all the grammatical sentences of L and none of the ungrammatical ones". This is an incredibly tough problem: there are no statistical shortcuts. It cannot be done unless you start to keep up with the biologists of language. These linguists have made much progress over the last 54 years.

Monday, September 28, 2009

Empty projects do nothing

Google is straying further from reality.

I must comment on the jaw-droppingly misguided projects listed at Google's Project 10^100. This kind of list "chosen from suggestions" is inevitable when wealthy, highly-indoctrinated technologists embark on "saving the world" with no experience in community work, and no apparent information on the world's biggest problems.

* Create more efficient landmine removal programs

First of all, wouldn't stopping the deployment of any more landmines be a more interesting and permanent goal? It would require admitting that the country Google has the most influence in is the largest manufacturer and user of weapons that attack civilians.

Existing removal programs are quite low-tech and effective (replacing hoes with shovels, for example) whereas an actual anti-war anti-aggression movement really needs technology to allow people to organize more fully.

* Drive innovation in public transport

"Develop new transportation technologies to help move more people with less energy, greater efficiency and fewer casualties." You mean, like trams and trains, jitneys and bicycles? We have all the transportation technologies that are needed ... once again, the issue is to get them to happen again, and in the right way: with local manufacture. The only high-tech that would be useful here is the means for people to organize to get the infrastructure that has been taken from them.

* Build real-time, user-reported news service

It seems like twitter and facebook already do this ... are we aiming to re-brand them into news organizations? There are hundreds of initiatives like this, and the best ones will already float to the top. Unless hamstrung by a major "Google 10^100"-funded competitor ...

* Make educational content available online for free

How about helping people to demand more funding for education and research? They need to demand that taxpayer-funded research, and any published research for that matter, not be locked-up in private journals and books. They need to fight the perception that education and research is an "entitlement" ... and promote the idea that everyone should receive an education, and benefit from the open disclosure of results, from the government they have themselves funded ...

* Create real-time natural crisis tracking system

First responders will tell you that they could develop what they need if they only had the resources to do so. But in our privatization-crazed society, this is unlikely: for example, the radio frequencies that used to be clear for emergencies have been sold to the entertainment industry.

* Make government more transparent

Wow. With technology? I think all the technology is already available ... what is needed, again, is funding to politically organize, and crack open government and corporate opacity.

* Help social entrepreneurs drive change

You know, small business creates change everyday. Why not help communities to fund their own private, small-scale solutions to their problems? Why does it need to be national and global scale venture capital, with its sterilizing effect upon local life?

* Provide quality education to African students

Just find teachers and educators in Africa, and ask them what they need. They will tell you, invariably, that they need freedom from neo-colonial first-world powers stealing their resources and forcing governments upon them. If you want to help Africa, forget aid ... just stop the theft.

* Encourage positive media depictions of engineers and scientists

More PR? Science-fiction already has so many positive images of scientists that it has long parted ways from the reality of real-world science. The best way for people to learn to like science is to get involved in doing it.

* Build better banking tools for everyone

Where to start on this one? It is a transparent attempt to dress up the further encroachment of the financial industry into our lives, as opposed to making people independent of these rapacious corporations with their amoral behavior.

* Work toward socially conscious tax policies

Different taxing schemes? The problem is how the money is spent -- trillions for weapons and corporations, almost nothing for people.

* Collect and organize the world's urban data

This one doesn't even seem to have any goals. Here's a better one: collect data on the effect of technocratic government policies (like Urban Renewal) and corporate greed (like real estate speculation) on the quality of life of people in cities.

* Create genocide monitoring and alert system

We know where these things are happening. The problem is stopping them, including the collusion of our own government in the vicious and "expedient" disposal of "inconvenient people".

* Enhance science and engineering education

Again, fund the organization of political empowerment, bring democracy to this country, so that funding of education is part and parcel of daily life.

* Promote health monitoring and data analysis

The biggest problems in medicine are the payouts to corporations selling insurance and pharmaceuticals etc. Eliminate profit from health-care, and there will be plenty of money to improve it.

* Create real-world issue reporting system

Good grief. This assumes that the "appropriate authorities" care about doing the job they say they are doing! They don't, not even at the most visible levels of government. They care about the wealthy and the powerful, and the rules that support them, not about the majority of people. What we need to do is organize to force accountability from those in power. That means organizing all these complaints ... which are already easy enough to find ... into political action.

Wednesday, July 05, 2006

Google transport: A to B

In Google Earth, it's possible to ask for directions from Paris to St. Louis, but it gives you no results.

If all modes of transport were integrated into the system, Google would be able to capture the ticketing market from, say, Expedia & Ticketmaster ...

Tuesday, September 20, 2005

Geographical connection engine ...

I believe there are huge geographical databases already available to Google, or any search engine.

Let's call it a Map Rank, attached to a search rank, which would return a reasonable geographic approximation of text data.

Take, as an example, the search string "Nobel Prize Winners". Search through wikipedia, find all Nobel prize winners, rank them, find all places mentioned in the article, rank them by prominence in the article, and then draw balloons on the map. You'll get nice clustering, and snippets of text will appear when you click on the balloons, or when you zoom in close enough so that there are just a few.

Now, imagine multiplying this over dates. Dates are found in the articles, and become milestones along a timeline. Type in a time window, and a map is shown, with a slider on a time line -- only those nobel prize winners with dates within the slider window are shown.

Make some assumptions about time & space in an article: like "if a year and a place are mentioned in the same phrase, there's an association". Then you could show the movement of those nobel prize winners, according to the available information.

Open up other databases, for population, poverty, commerce ... add old phone books from around the world, and the currently digitizing research libraries, and you can find this "geographical connection engine" will tell you a lot about the real world you would not have easily guessed.

If you animate some of these searches, you can watch the ebb and flow of history. It will look very ... geological. And, OK, it might spawn a new kind of cliometric physics, somewhat reminiscent of Hegel ...

Tuesday, June 28, 2005

Just a matter of time ... ?

I use Linux & Macs, so I've never tried the Windows-only Google Earth download, the former keyhole product.

But a Google-advisor writes, first quoting my e-mail:

Take the "State of the World Atlas" ... imagine if this data was available at all scales, as a GIS annotation layer upon Google maps. A Google News cluster could have a geocoded story link to Google Maps, with the appropriate uncontroversial layers (population, migration, poverty, jobs, pollution levels, trade, transport, social services etc.) enabled.

And then writing:

"I've just downloaded the new Google Earth, and it has the basic structure for doing this. It provides a menu of "layers" of different kind of info, and has provision for people outside of Google to provide the information for "User provided" layers (I see there is one for some UNESCO WHS data, but don't see it on the map). There is currently only low resolution outside of US, but presumably that will get better. Is this the kind of thing you're thinking of?"

They're working on a Mac version of Google Earth. So I'll see it someday.

But I'd like to see these layers in Google Maps. I'm sure the Google digital map groups are closely allied. So it may be just a matter of time. But, some demand would help move development up the queue.

Then, the next step: create contextual map links near news stories. Keywords in Google News clusters can be automatically weighed against canonical descriptions of the GIS layers, as well as the map data labels. The right coverage area would be found, and layers could be pre-loaded, but turned off, with a list of checkboxes, so the the user can enable them.

This would change the nature of discourse about the world ...

Map search broken

I'd like to look at a map of the western hemisphere, type "UN headquarters", and get something like this.

Unfortunately, it doesn't work yet! Google Maps is a local search, based on a small area around the center of the map, rather than on the whole map. If I don't know where in the world UN headquarters is, then I'm out of luck trying to find it in this way.

The search must be weighed against the coverage of the map. At present, you get this, which makes no sense.

[To get this 'demo', I found the UN in NYC, re-centered the map, then clicked "link to this page".]

Monday, June 27, 2005

An interface for your conclusions

Imagine GIS layers that overlay Google maps, automatically selected for uncontroversial relevance to a query.

Say that a Google News article was about Nuclear Weapons. A Google GIS link would appear with the story. Click it, and a Map would appear, with checkboxes on the right, unchecked, representing the unweighted layers that the system considers relevant.

* national data: size of a government's nuclear arsenal
* dates & locations of a government's nuclear tests
* national data: dates & estimates of budgets dedicated to nuclear weapon development
* dates & locations of use of nuclear weapons in wartime
* dates, locations & number of nuclear weapons stationed, by government
* national data: international treaties signed, not signed

I stayed away from controversial, editorial keywords & sources, such as the US government lists of allies, on one side, and Bulletin of Atomic Scientist lists of dangerous acts, on the other. I think, based on the uncontroversial hard data, people will see for themselves who the most dangerous nuclear power on Earth is.