Hands-on with TensorFlow on GCP

Following is my experience with the Google Cloud Platform (GCP). I am already familiar with Amazon's Elastic Compute Cloud (EC2), so this investigation will help me decide which platform better suits what I needed for my own terraAI project.

The following was recorded in the October of 2016. Since GCP and TensorFlow are likely to evolve quickly, I expected that the following information could become outdated fairly quickly.

Please note that I am approaching this from the point of view of using GCP+TensorFlow for Machine Learning researches, and not general computing. However, it is likely that a good part of this could be useful to anyone who wants to use GCP for other purposes.

The TensorFlow tested as of this writing was version 0.11.

Why GCP?

Several reasons prompted me to look into GCP:

  1. GCP explicitly supports scalable Machine Learning services through TensorFlow, which seems very useful to machine learning tasks that require a lot of computing power.
  2. GCP offers credits of $20,000-$100,000 for startup companies. You might want to look into it to see if it is applicable to you.
  3. Google is offering a two-month free trial with USD$300 credit.
Doing the Free Trial
  1. Go to GCP's homepage, click on the TRY IT FREE button and follow the instructions there (ref: GCP docs) to set up the basic environment. It takes perhaps an hour to get everything set up, which was a little tedious but overall the instructions were pretty clear.

  2. Set up an instance for testing out the Google Could Machine Learning API (GCML). Note that this involves enabling the relevant API from the Cloud Shell. Cloud Shell is basically a browser-based terminal console for your server instance, which is based on Debian Linux.
    Note that during this process you will be creating a GCP bucket for persistent storage (similar to the Amazon S3).

  3. Testing out GCML using Training Quickstart, which is a canned example using the MNIST dataset.
    Training: running this test example takes only a couple of seconds through the initial Cloud Shell (i.e., without submitting the task to the a new VM instance). Inspection: a tool called TensorBoard is available for inspecting the ML model and result. This tool is launched as a web server by typing the command "tensorboard --logdir=data/ --port=8080" in Cloud Shell's command line, the UI (which is a browser client) can then be launched by clicking on the Web preview button on the menu of the Cloud Shell. Following is a sample view of the computation graph on the TensorBoard: More information about this TensorFlow tutorial can be found here. TensorBoard supports interactive inspection and manipulation, which is very nice.

Quick Observations

  1. No GPU instances are available at this time.
  2. The available instance tiers are not as extensive as EC2's.
  3. The Cloud Shell often feels sluggish, where merely echoing the textual commands entered could take a second or two, which is quite annoying.
  4. The pricing for standard GCP bucket storage is USD$0.026 per month per GB, which is roughly comparable to Amazon's S3 (which is at USD$0.03 per month per GB).
  5. What you can access through the Cloud Shell is essentially a very small VM instance that is allocated automatically for you. Although you probably can do a wide array of things with it (e.g., installing packages, run programs, etc.), you should use the Cloud Shell principally as a management console, since it offers only very small computing capacity and it is also ephemeral.
  6. For serious local computing that is not yet ready for the cloud, you should create a real instance which I will discuss later.
Setting up a new VM instance

The FREE TRIAL process above gave me a quick taste of what GCP offers, so my next step is to create a 'new' virtual machine instance for more serious computing. My purpose here is to develop and conduct training of a certain Machine Learning model, as such it should have reasonable computing power.

A default "real" instance will cost around USD$30 per month. Such an instance can be stopped when not needed, in which case it should then cost little based only the storage required to keep it.

The configuration and cost for the default VM instance created is as follows:

  1. 1 vCPU with 3.75 GB memory @ nominal cost of $36.50/month
  2. 10 GB standard persistent disk $0.40/month
  3. Sustained use discount: $10.95/month
  4. $25.95 per month estimated, Effective hourly rate $0.036 (730 hours per month)
  5. Deian GNU/Linux 8 (jessie)

Somehow trying to set up a real VM instance for the ML package turned out to be much more tedious than I had expected.

Following are what I had to go through:

  1. It wasn't clear how to do this from the GCP Dashboard after I was done with the quick trial, so I had to google separately to find the instruction. Click on the 'LOCAL:MAC/LINUX' tab there, and find the first step "Install Miniconda for Python 2.7".
  2. Launch a browser shell from a selected instance on the GCP Compute Engine console. This shell is annoyingly slow but it will do for now since I had trouble setting up SSH access (see below).
  3. Follow the direction (see the "Linux Anaconda install" section) to download an installation script Anaconda-latest-Linux-x86_64.sh
  4. Execute the installation script and got an error message that bzip2 is missing.
  5. Execute 'sudo apt-get install bzip2' to get it installed, then install Anaconda again.
  6. Got error message that the directory 'anaconda2' already exists. Remove the directory then execute the installation script again.
  7. Close and reopen the shell as instructed to get the installation to take effect. This got the anaconda installed successfully.
  8. Now back to the "Setting up your environment" page, execute SIX more steps to get various components installed. For the last step that installs TensorFlow I got the following error message.

    Installing collected packages: funcsigs, pbr, mock, setuptools, protobuf, tensorflow
    Found existing installation: setuptools 27.2.0
    Cannot remove entries from nonexistent file/home/kaihuchen01/anaconda2/envs/cloudml/lib/python2.7/site-packages/easy-install.pth
    
  9. At this point invoking python then try 'import tensorflow' got 'ImportError: No module named tensorflow' so the installation obviously failed.
  10. Googling around for solutions and found this. Follow the "frankcarey commented on Jan 9" entry solved the problem. This is great, but we are not done yet.
  11. Next do the step Install and initialize the Cloud SDK using the instructions for your operating system. There are 14 (!) steps there.
  12. Moving along and reaching the step gcloud components install beta and got the following error message:

    You cannot perform this action because this Cloud SDK installation is managed by an external package manager.  If you would like to get the latest version, please see our main download page at:
    https://cloud.google.com/sdk/ ERROR: (gcloud.components.install) The component manager is disabled for this installation
    

    decided to just ignore it and move on.

  13. On this step 'curl https://storage.googleapis.com/cloud-ml/scripts/check_environment.py | python' I got the following error message:

    ERROR: Unsupported TensorFlow version: 0.6.0 (minimum 0.11.0rc0). Perhaps installation on TensorFlow failed eralier. 
    

    Reinstalling it somehow made the problem go away.

  14. Got error When trying to verify with import tensorflow in python. Googling around and found this, using 'pip install -U protobuf=3.0.0b2' to solve the problem.

  15. At this point get finally have 'Success' at the 'Verifying your environment' step.

Seriously, Google? This whole process really need to be made much simpler.

Google Cloud SDK

The Cloud SDK is described as follows:

The Cloud SDK is a set of tools for Cloud Platform. It contains gcloud, gsutil, and bq, which you can use to access Google Compute Engine, Google Cloud Storage, Google BigQuery, and other products and services from the command-line. You can run these tools interactively or in your automated scripts.

While it is not stated explicitly, this in fact allows you to manage GCP from your own computer, instead of doing so from the Cloud Shell through the browser. You don't need to install this if your purpose is to do a quick test.

Generating SSH key pair

For real work it is vital to be able to transfer files to/from GCP (e.g., using a tool like winscp), as well as being able to use a native SSH client (e.g., putty) for accessing GCP. This is not needed if your purpose is to do a quick trial of GCP.

The first step for doing FTP and SSH is to generate a SSH key pair. The GCP documentation describes two ways of doing it.

Generated SSH key pair using puttygen (failed)

This involves using a native puttygen program on my Windows PC.

Following the "To generate a new SSH key-pair on Windows workstations" steps in this instruction, I got the following error message:

Invalid key. Required format: <protocol> <key-blob> <username@example.com> or <protocol> <key-blob> google-ssh {"userName":"<username@example.com>","expireOn":"<date>"}

Abandoned.
Update (2016.10.17): based on experience below with setting up FTP and SSH, it is likely that this was due to the fact the key pair generated in puttygen is not of the right type. However the instruction is not clear in this respect. Overall it is still easier doing this on the VM instance (see below), since you then do not have to bother with doing relevant configuration on the VM instance.

Generated SSH key pair on the VM instance (works!)

  1. Open a browser-based session to the target VM instance.
  2. Issue the command gcloud compute ssh instance-1, where instance-1 is the name of my instance. It reports that there is no SSH key for the Google Compute Engine, and proceed to generate a rsa key pair.
  3. Enter passphrase.
  4. In the end it reports: ERROR: (gcloud.compute.ssh) Could not SSH to the instance. IGNORED

This is the key pair that we are using for the FTP and SSH setups below.

Accessing GCP through Winscp

For setting up winscp (my favorite FTP client), do the following:

  1. Download the the private key generated above in /home//.ssh/googlecomputeengine. Actually I just display it in the console then cut-and-paste it into a local file.
  2. Use puttygen to convert the private key file from SSH2 format to PUTTY format (as required by winscp)
  3. Create an access entry in winscp, using the private key above, as well as the instance's external IP address.
  4. Connect to server instance

Key caching: If you created your key pairs with a passphrase then you will be prompted to enter the passphrase every time you connect through Winscp or a SSH client. This could be quite annoying since such sessions do time out quite often, and you will be forced to enter the passphrase every time.

A good solution for such a problem is to used a tool that caches the private key, such as Pageant.

Accessing GCP through SSH
  1. Use puttygen to convert the private key file from the old SSH2 format to a more updated format (as required by KiTTY or PUTTY)
  2. Create an access entry in KiTTY and configure it with the private key above, as well as the instance's external IP address.
  3. Connect to server instance.
Accessing Cloud Storage from Python

The Google Cloud Storage offers persistent storage similar to Amazon's AWS S3. Creating a GCP bucket (called variably as 'bucket' or 'disk' in GCP) is easy from the GCP admin console. Here my goal is to set it up so that I am able to access my bucket from a Python program. This turns out to take more effort than I had expected.

Following is a log of my quest:

  1. First we need to install the client library for Python. The starting point is this page: Goolge API Client Library > Python
  2. The above page points to this page, which instructs me to execute the command pip install --upgrade google-api-python-client, this went without a hitch.
  3. At this point if I try to execute python -c "import cloudstorage" I get the error message 'ImportError: No module named appengine.api'. Does this mean that somehow the Cloud SDK hasn't been installed correctly? Trying to reinstall it following direction did not help at all.

Abandoned for now.

Mounting Cloud Storage as a local file system

So how can we mount a cloud bucket as a local file system on a given instance? Doing so is vital for my target setup of having many VM instances being created and terminated as needed, but with a persistent storage for all information shared among these instances.

Following are the steps:

  1. Follow the steps here to create and mount GCP cloud storage.
  2. At some point it indicates the need to install the gcsfuse tool on the instance which leads to these instructions here . Beware that when the instruction demands the execution of the gsutil command, but it may report error unless sudo gsutil is use instead.

After this is done it is then possible to do things such as having multiple VM instances accessing shared data sets stored on a bucket. However, please beware of the following pitfall (excerpt from the instruction page):

Note: Cloud Storage is an object storage system that does not have the same write constraints as a POSIX file system. If you write data to a file in Cloud Storage simultaneously from multiple sources, you might unintentionally overwrite critical data.

Persisting the mounted drive The above instructions only get a bucket mounted for the session, which means that it will not get mounted automatically next time when the instance is rebooted, which is not good. The GCP does not make this information readily available, so I had to google around to find out how to achieve this.

Found some information here. It seems by default gcsfuse allows only the user who mounts the file system to access it, but when you put an entry in fstab to get it auto-mount on reboot the root user will own the file system, thus preventing others from accessing it.

The following is what works and the lessons learned:

  1. Edit /etc/fstab to auto-mount a bucket on reboot, but do not just edit and then reboot the system since you might brick the instance and had to ditch it. Better create a snapshot before you do this.
  2. For me the following fstab entry works (until I rebooted the system, that is):

    console-xxxxxx.appspot.com /mnt/mybk gcsfuse rw,user,allow_other
    

    where console-xxxxxx.appspot.com is the GCP name for my bucket, and /mnt/mybk is the mount point on my file system. The allow_other flag allows the root to do the mounting while still let other users access the bucket.

  3. Do not just reboot in order to test the updated fstab, since you may brick the whole instance if there is something wrong with the change. Instead use the command sudo mount -a to test it out first.
  4. Up to this point everything worked well for me, i.e., I was able to get the bucket mounted through a new entry in fstab and using sudo mount -a to verify (i.e., without rebooting). Files in the bucket are accessible as expected. However, the instance is bricked as soon as I reboot it. I tried these three times all with the same result.

There is something called the serial console which is useful for limited diagnosis. This console is accessed from the VM Instances dashboard, find the instance in question, click on the SSH dropdown menu at far right and select View gcloud command, then you will be able to see the system's boot log and get some sense of what's going on.

With help from the Google Cloud forum, following are the steps to make it work:

  1. You must have 'noauto' flag in the fstab, otherwise the system is going to hang on reboot. You also need to have the dirmode and filemode there, otherwise the files in the bucket won't be writable. The fstab entry that works looks like the following:

    bucket-name mount-point gcsfuse rw,noauto,user,allowother,filemode=777,dir_mode=777 0 0

  2. Add an entry mount mount-point in /etc/rc.local file to get the bucket mounted on reboot.

Here are some relevant information about the permissions and ownership of the gcsfuse system:

By default, all inodes in a gcsfuse file system show up as being owned by the UID and GID of the gcsfuse process itself, i.e. the user who mounted the file system. All files have permission bits 0644, and all directories have permission bits 0755 (but see below for issues with use by other users). Changing inode mode (using chmod(2) or similar) is unsupported, and changes are silently ignored.

Solution finally
With help from the Google support (see here) following is the correct solution:

  1. Upon creating a new instance the full access scope must be specified.
  2. The file /etc/fstab needs to contain the following entry:

    bucket-name mount-point gcsfuse rw,noauto,user,allow_other
    
  3. The rc.local needs to contain the command mount mount-point so that the bucket will get mounted on reboot.
  4. Modifying a file in the bucket will require the use of 'sudo', otherwise permission will be denied.

One loose end is that it is still impossible to make a file executable (e.g., for a shell script), since sudo chmod simply fails silently.

Adding or Resizing local disk

Can you resize the local disk attached to a VM instance, or adding more disks? This is quite important since you might start an instance at 10GB then later find that you need more space due to the requirement for some large datasets for ML computation.

Fortunately the answer is yes and it can be done quite easily through the Compute Engine console (see instruction here).

Reserving static IP Address

If there is a need to set up a server on GCP that accepts requests over the Internet, then it is vital to have a static IP address. The method for reserving a static IP address can be found here.

Reserving a global static address may incur more charges.

The pricing for reserved static IP addresses can be found here.

Managing ML jobs

Machine Learning jobs usually take a very long time to complete. Following are some techniques that worked well for me. Some of these are not specific to GCP or ML, but nonetheless kept here as a reference:

  1. The Linux Screen tool.
    This tool allows me to start a long-running job in a virtual 'window', detach from it to do other things or even close the SSH session, then come back later to re-attach to the same virtual window to check on the progress. Such operations involve only keyboard commands and is much easier than other options. Without such a tool I will either have to run manually from a terminal and risk accidental termination of the session, or have to do some more complex setup.
Multiple Instance Setup for ML

When using a hosting service (such as GCP or AWS/EC2) for ML computing we face a dilemma: the server instances are charged by the hour and capacity, and a large instance is great for sustained heavy computation but too expensive if we are just spending time tweaking a model manually.

One way to deal with this is with the following setup:

  1. Create a small instance and configure it with all the software package needed for the task. This is where you'd do all manual tasks that do not require a lot of computing power.
  2. Set up a persistent bucket (e.g., GCP's Cloud Storage, or Amazon AWS/S3) and mounts it as a local file system on the small instance. Put all of the code and data there. One side benefit of doing this is that you now also have almost unlimited storage space, without having to allocate new disks on the instance when you run out of space.
  3. When the configuration for the small instance is stable, create a snapshot out of the small instance, then use it to launch a large instance that has a lot more computing power. This way you then do not have to go through the same tedious configuration process again.
  4. Also mount the same bucket on the large instance as a local file system. This way all instances involved will have a shared storage space.

Given the above setup, the following is then the typical workflow:

  1. You'd normally be shutting down the large instance (or instances) so it will cost you very little.
  2. For all the time-consuming manual coding, tweaking, and exploration you'd do them on the small instance, which does not cost too much.
  3. When you are ready to train a ML model, you can then start the large instance(s), and all the latest code and data will be already there in the bucket for you to start the training. Make sure that the training output is stored in the shared bucket.
  4. Create some script on the large instances, so that they will shut down automatically at the end of a long-running training session. This way they won't cost you money while doing nothing for you.
  5. Since the result is placed in the bucket, you can then inspect them from any instance at your leisure.

This way the large instances are then used in an on-demand fashion, which then should reduce the cost a lot. If you use the 'on-demand or spot instances' as offered by the hosting service, then it should reduce your cost even further.

I have verified that the above setup works well on the GCP. One caveat is the warning mentioned in the "Accessing Cloud Storage from Instance" section above regarding concurrent write operations into a shared bucket.

Creating a server image

On Amazon's EC2 I was able to create an AMI image for one of my server instance, then use it to spawn another server instance. This is very useful since otherwise I will have to go through tedious configuration process for each new copy of the server instance.

On GCP this can be achieved from the Compute Engine VM Instances Dashboard, by creating a snapshot out of an existing instance. The snapshot can then be used to launch a new VM instance, with different capacity (say, with 8 virtual CPUs and more disk memory) if needed.

The tests I conducted worked well on GCP. In particular I have checked out the following with success on the new VM instance:

  1. All installed packages are present and function correctly.
  2. The shared bucket are mounted as expected, so it can be accessed immediately.
  3. The TensorFlow code that I placed in the shared buckets can be executed with no problem.
  4. The SSH key pair is installed on the new VM instance, so I could SSH into the new instance without additional work.
Submitting a Training Job (unfinished)

The Cloud ML is a managed service on GCP that supports scaleable training and deployment of large Machine Learning jobs. Following is the official description:

Google Cloud Machine Learning is a managed service that enables you to easily build machine learning models, that work on any type of data, of any size. Create your model with the powerful TensorFlow framework that powers many Google products, from Google Photos to Google Cloud Speech. Build models of any size with our managed scalable infrastructure. Your trained model is immediately available for use with our global prediction platform that can support thousands of users and TBs of data. The service is integrated with Google Cloud Dataflow for pre-processing, allowing you to access data from Google Cloud Storage, Google BigQuery, and others.

CLoud ML supposedly offers many benefits. My limited goals here are: to check out the basic mechanism of submitting a Cloud ML job, see if am I able to train a TensorFlow model much faster through it (as opposed to training the model on my own VM instance), and getting a sense of the cost associated with it.

Note that the Cloud ML API came with the following warning (as of October 2016):

*Beta: This is a Beta release of Google Cloud Machine Learning API. This API might be changed in backward-incompatible ways and is not recommended for production use. It is not subject to any SLA or deprecation policy.

I started with following the instruction on gcloud beta ml jobs submit training, which did not go very far. The instruction is terse and without examples, and many things are unclear in the instruction. For example, it is entirely unclear what should be included in the required tar.gz files.

I will update this post with more information later when better instruction becomes available.

Tensor Processing Unit (TPU)

What about the TPU that Google announced in May 2016, which was used in the AlphaGo system and held great promise in speeding up deep learning tasks? Unfortunately there is no sign of it on GCP as of November 2016, and it is entirely unclear when we will see it on GCP and at what cost.

Support Forum

While looking for solution to my Python problems here, I found a support link that leads to a forum which seems to have fairly low traffic. I did get prompt and informative response which eventually help me resolve my issues.

Annoyances

There were some annoyances during my tests:

  1. The console terminal used for accessing an instance, either through the browser-based terminal or through a native SSH client, is frustratingly slow. Even just pressing the "ENTER" key could take a couple of seconds for it to respond. This is on an instance that has nothing else running on it at the time. This is quite unacceptable.
  2. Cloud storage performance: copying files between an instance's disk and a bucket is excruciatingly slow. Just copying a directory of a ghost blog system (~186MB)took hours. In a separate test I tried to un-tar a large ML dataset directly in a bucket without copying between local disk and the bucket. The dataset in question was CSTR VCTK Corpus, which contains many tiny text files (aside from audio samples). Visual inspection showed that such tiny text files were extracted into the bucket at the rate of roughly 40 files per minutes. One training dataset that I needed for a Machine Learning job contains 200000 files, which means that just unzipping it into a bucket will take 5000 minutes, or more than three days! In comparison, unzipping the same dataset on an instance's local disk took only 25 seconds.
  3. Auto-mount cloud storage: I had difficulty getting a mounted bucket to survive a reboot. I needed this so that I do not have to remount it manually for every session. Without getting this to work it is then impossible to set up my target environment using multiple preemptible servers with shared buckets among them, so that I can take advantage of the much lower cost of preemptible servers. It took me some effort to get this to work.
  4. Recovery: I somehow bricked several VM instances, likely due to trying to add an entry to /etc/fstab in order to get a GCP bucket remounted automatically upon reboot. The VM Instances dashboard shows that the instance is up normally with a green check mark and no error messages, but somehow it was not possible to connect to it using normal means (Winscp, SSH, Cloud Shell, browser-based SSH windows, etc.). I can't find any way to roll back the change, so eventually had to resort to deleting the instance.
  5. I wanted to assign a previously reserved static external IP address to a new instance. This way I then don't have to deal with updating many Winscp and SSH scripts that I set up. This was pretty easy to do using the Elastic IP in AWS, but in GCP it turns out to be more confusing. The UI for dealing with static IP kept telling me "Quota 'STATIC_ADDRESSES' exceeded. Limit: 1.0" while I already have two previously allocated static IP addresses. It was entirely unclear from the Dashboard how I could get pass this.
    It the end I found through experimentation that I am able to reserve only one static IP address per region/zone, and the only way to reuse one is as follows:
    1. Create a new instance in the same region/zone as the old instance where the static IP was assigned. The following won't work if the two instances are in different regions/zones.
    2. Detach the static IP address from the old instance from the VM Instances dashboard by changing its External IP to Ephemeral.
    3. Assign the static IP address for the new instance. This can be achieved only from the Networks dashboard, and not from the VM Instances dashboard.
  6. Problems with cloud storage. Took a little while for it to become obvious to me that files on the GCP storage are not normal Linux files, and in many cases require special handling. While the AWS S3 files are also not normal files, it is certainly not as finicky or slow as GCP's. For example:
    1. I was unable to change the access permission for a file in the bucket, such as when trying to give a shell script the 'execute permission. Doing chmod on the file simply has no effect and there is no warning message why it has failed. Changing bucket permissions from the Storage dashboard has no effect either.
    2. You cannot move or rename files/directories like normal Linux files/directories. You will need to use special commands for this as per instructions here. Such commands can be quite tedious to use if you so happen to have very long bucket names, as is my case when I took the default assigned by GCP (e.g., console-xxxxxx.appspot.com). I found that in many cases (if the directory/file is not too big) it is actually much easier to just use winscp to do the copying.
Conclusions

The GCP is mostly very well done. The prospect of easily achieving large-scale ML computing through TensorFlow on the Google Cloud Platform is alos quite appealing.

From the perspective of simple hosting and cloud storage, price-wise the GCP is roughly competitive with Amazon's AWS.

For the sake of running Machine Learning jobs I really wish that GCP has TPU or GPU support. Even running with 8-vCPUs (the largest configuration available under the free trial) it is still not adequate and often for doing Machine Learning and it often takes days for me to complete even modest training tasks.

On the down sides, sorting out the setup problems mentioned above took me practically all day, which was much more than I had expected. There are also some nagging issues, such as auto-mounting and access permission for the GCP storage, that took quite some back and forth with the GCP support people (which were very helpful) to figure them out.

Overall setting up a VM instance with the ML package was unnecessarily complicated, forcing me to go through dozens of steps with many pitfalls. GCP should have just created a number of pre-configured ML server images for user to choose from, so that setting it up could be just a matter of making simple configuration choices then be done in minutes instead of hours. Perhaps this is what you get with the ML package being in the beta phase, and I trust that it will get better over time.

Microsoft Azure also offers many Machine Learning packages, and AWS also has good support for various GPU instances, so what advantages does GCP have over Amazon AWS or Microsoft Azure for Machine Learning? GCP is a natural choice for my ML needs due to the following reasons:

  1. My immediate goal is mainly about conducting Machine Learning researches, and not about using existing Machine Learning packages intended mostly for the business community.
  2. Lately many of the leading-edge Machine Learning papers have come from Google's DeepMind group, which also kindly releases many of its source codes often implemented in TensorFlow, and GCP has better support for TensorFlow.
  3. TensorFlow would appear to be in the leading position regarding supporting large-scale deployment of Machine Learning applications.
  4. Google does a very good job about making TensorFlow available to the research community.

If you need to do large-scale Machine Learning either for research or business, GCP+TensorFlow holds the promise of being one of the best choices. While it is possible to install TensorFlow on AWS with multiple GPU instances for meeting such needs, at this time it is likely to require more work in scaling and configuring the setup. The caveat here is that the high-end scalability of the GCP+TensorFlow is not yet readily obvious in my limited tests.

  1. Machine Learning on Google Cloud and AWS/EC2, Hands-on: practical issues about running computing-intensive jos on GCP or AWS/EC2. This uses DCGAN (Deep Convolutional Generative Adversarial Networks) as a test case.
  2. Image interpolation, extrapolation, and generation: looking into the possibility of using the DCGAN for the purpose generating images (and eventually 3D models) from textual commands. This is part of the How to Build a Holodeck series.
  3. The terraAI manifesto
  4. 1.
comments powered by Disqus